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Objectives:  This  three-year  research  effort  is  being  conducted  at  Texas 
A&M  University  by  the  principal  investigator.  Dr.  Peter  F.  Stiller,  and  a 
number  of  graduate  research  assistants. 

We  begin  this  report  by  reviewing  the  project's  objectives  as  outlined  in 
the  original  proposal  abstract. 

The  general  problem  of  single-view  recognition  is  central  to  many  target 
recognition  and  computer  vision  tasks.  For  example,  efficiently 
recognizing  three  dimensional  arrangements  of  features  (such  as  geometric 
configurations  of  lines  and/or  points)  from  a  single  two  dimensional  view 
is  a  key  research  question.  A  solution  will  require  an  approach  that  is 
view  and  pose  independent.  Unfortunately,  existing  methods  often  rely  on 
computationally  expensive  template  matching  that  is,  strictly  speaking, 
not  view  or  pose  invariant.  Instead  those  methods  use  comparisons  against 
templates  created  for  each  possible  view;  with  the  infinite  number  of 
possibilities  being  approximated  by  some  finite  number  of  views.  To  carry 
out  an  invariant  approach  to  target  recognition,  we  must  seek  out 
properties  and  relationships  that  are  geometrically  intrinsic  to  the 
objects  and/or  images  being  compared.  We  must  also  be  aware  that 
different  sensors  necessitate  different  models  of  image  formation  and 
therefore  different  forms  of  invariance.  Radar  and  Ladar  make  use  of  an 
orthographic  model,  while  most  optical  sensors  use  a  weak  perspective  or 
full  perspective  model. 

Consideration  of  view  and  pose  independence,  as  well  as  a  desire  for  a 
coordinate  independent  formulation,  leads  naturally  to  characterizing 
configurations  of  features  by  their  3D  or  2D  geometric  invariants.  The 
specific  group  (Euclidean  group,  similarity  or  conformal  group,  affine 
group,  or  projective  general  linear  group)  to  which  things  should  be 
invariant  is  a  function  of  the  sensor  type.  We  also  need  to  determine  a 
fundamental  set  of  equations  that  express  the  relationship  between  the  3D 
geometry  and  its  "residual"  in  a  2D  (or  ID)  image.  These  are  known  as 
object/image  equations.  They  should  completely  and  invariantly  describe 
the  mutual  3D/2D  constraints.  Once  found,  they  can  be  exploited  in  a 
number  of  ways.  For  example,  from  a  given  2D  configuration,  one  could  use 
the  01 -equations  to  derive  a  set  of  nonlinear  constraints  on  the  geometric 
invariants  of  the  3D  configurations  capable  of  producing  that  given  2D 
configuration,  and  thus  arrive  at  a  test  for  determining  the  object  being 
viewed.  Conversely,  given  a  3D  geometric  configuration  (features  on  an 
object),  one  could  derive  a  set  of  equations  that  constrain  the  invariants 
of  the  images  of  that  object;  helping  us  determine  if  that  particular 
object  appears  in  various  images. 

We  propose  to  create  so-called  "global"  forms  for  the  object /image 
equations,  study  their  properties  (especially  under  geometric  degeneracy), 
and  exploit  them  to  develop  new  algorithms  for  target  recognition.  This 
will  require  using  advanced  mathematical  techniques  from  algebraic  and 
differential  geometry  to  construct  generalized  shape  spaces  for  various 
projection  and  sensor  models.  We  will  use  that  construction  to  find 
natural  metrics  that  express  the  distance  (difference)  between  two  object 
configurations,  two  image  configurations,  or  an  object  and  an  image  pair. 


These  metrics  should  produce  the  most  robust  tests  for  target 
identification;  at  least  as  far  as  target  geometry  is  concerned. 

Moreover,  such  metrics  will  provide  the  basis  for  efficient  hashing 
schemes  to  do  target  identification  quickly,  and  they  will  provide  a 
rigorous  foundation  for  error  analysis  in  the  ATR  process. 

A  summary  list  of  the  research  topics  that  are  included  in  this  proposal 
appears  below: 

Proposed  Tasks  and  Problems 

1.  Object/Image  Relations  and  Shape  Spaces 

2.  Extending  the  0/1  Formulation  to  Other  Feature  Sets  and  Sensor  Models 

3.  Symbolic  Computation  and  Alternative  Methods  for  Computing  the  0/1 
Equations 

4.  Metrics 

5.  Recognizing  Articulated  Objects 

6.  Geometric  Hashing 

7.  Unlabeled  Matching 

8.  Shapelets 

9.  Noise  Analysis 

10.  Performance  Prediction 

11.  Technology  Transfer 


Status  of  Effort;  (Period  of  Performance  6/1/04  to  9/30/07.) 

At  the  time  of  this  writing  the  grant  has  ended,  having  been  on-going  for 
the  previous  40  months. 

Recall  that  in  previous  AFOSR  sponsored  work  we  were  able  to  achieve 
several  important  results,  including  the  understanding,  development,  and 
analysis  of  a  global  approach  to  invariants  and  object/image  equations  in 
the  generalized  weak  perspective  (affine)  case.  That  work  also  included 
our  initial  construction  of  a  new  class  of  discrimination  metrics  that  are 
generalizations  of  the  classical  Procrustes  metric  of  statistical  shape 
theory.  In  the  first  instance,  we  provided  a  complete  dictionary  between 
the  old  algebraic  approach  to  invariants  and  the  new,  more  geometric, 
global  approach.  This  was  worked  out  completely  in  the  generalized  weak 
perspective  case  and  appears  in  our  paper  "Object/Image  Relations,  Shape 
Spaces,  and  Metrics"  and  more  recently  in  a  book  chapter  entitled  "Object- 
Image  Metrics  for  Generalized  Weak  Perspective  Projection,"  in  Statistics 
and  Analysis  of  Shapes,  edited  by  Hamid  Krim  and  Anthony  Yezzi,  Jr.  and 
published  in  2006  by  Birkhauser.  This  new  approach  creates  a  geometric 
framework  for  discrimination  theory  and  a  more  robust  approach  to 
recognition.  Some  of  the  main  ideas  and  their  application  to  the  full 
perspective  (optical)  case  were  presented  in  our  paper,  "Global  Invariant 
Methods  for  Object  Recognition"  described  in  a  previous  report.  New 
results  on  this  topic  have  just  appeared  in  our  paper  "Recognizing  point 
configurations  in  full  perspective,"  which  was  written  jointly  with  our 
graduate  student  Kevin  Abbott  for  the  Electronic  Imaging  Conference, 

Vision  Geometry  XV. 


Overall  our  global  approach  provides  a  way  to  explore  the  behavior  of 
recognition  algorithms  when  dealing  with  multiplicities  or  geometric 
degeneracies  (which  cannot  be  handled  with  other  methods).  The  difficulty 
in  using  the  classical  numerical  invariants  for  this  purpose  is  that  they 
are  only  rational  functions  on  the  appropriate  quotient  variety.  As  such, 
they  are  not  always  defined.  This  leads  to  serious  numerical  problems  in 
any  algorithm  based  on  these  invariants.  To  remedy  these  problems,  we 
succeeded  in  replacing  these  invariants  by  points  in  a  Grassmann  manifold 
in  the  weak  perspective  case,  or  by  certain  geometric  objects,  namely 
toric  sub-varieties  of  Grassmannians ,  in  the  full  perspective  case.  The 
object/image  equations  become  the  expression  of  certain  incidence 
relations  in  the  weak  perspective  case  or,  in  the  full  perspective  case, 
certain  "resultant-like"  expressions  for  the  existence  of  a  non-trivial 
intersection  of  the  toric  sub-varieties  with  certain  Schubert  varieties  in 
the  Grassmannian.  This  "global"  approach  to  invariants  is  providing  more 
robust  object  recognition  algorithms.  Moreover,  by  representing  the 
relevant  shape  spaces  as  varieties  embedded  in  projective  space,  we  can 
endow  each  shape  space  with  a  metric  by  restricting  the  standard  Fubini- 
Study  metric.  These  ideas  are  discussed  in  our  paper  "Object  Recognition 
from  a  Global  Geometric  Perspective  -  Invariants  and  Metrics."  This 
approach  produces  a  natural  metric  on  both  the  object  and  the  image  space 
that  can  be  exploited  to  create  an  effective  discrimination  theory  (i.e.  a 
meaningful  notion  of  "distance"  between  objects,  between  images,  and 
between  an  object  and  an  image.)  Finally,  several  new  directions  have 
emerged  from  our  work.  These  directions  have  been  incorporated  into  our 
research  and  include  the  study  of  object/image  equations  for  unordered 
point  features  to  facilitate  point  cloud  matching,  research  on 
object/image  equations  with  parameters  to  handle  articulation  of  objects, 
the  investigation  of  invariant  point  to  surface  matching,  3D 
reconstruction  from  motion,  and  the  statistics  of  shape  for  noise 
analysis.  Progress  on  these  will  expand  the  recognition  power  of  our 
approach  and  its  applicability  to  Air  Force  problems. 


Accomplishments /Findings : 

We  report  below  (in  summary  form)  on  several  significant  areas  of 
progress.  Details  can  be  found  in  the  listed  papers. 

1.  Shape  Statistics 

Kendall  pioneered  statistical  shape  theory  for  point  features  in  the  plane 
under  similarity  transformations.  Among  his  results  is  a  description  of 
the  distribution  of  shapes  for  point  features  selected  from  independent 
spherical  normal  distributions  each  with  covariance  matrix  normalized  to 
the  2  by  2  identity  matrix  and  with  means  at  selected  points  in  the  plane. 
One  can  regard  this  as  an  early  attempt  to  introduce  the  idea  of  "noisy" 
shapes.  An  important  question  is  to  determine  for  a  given  distribution  of 
object  shapes,  the  corresponding  distribution  of  image  shapes  under 
appropriate  hypotheses.  This  was  something  not  addressed  by  Kendall  or 
others  working  in  this  area.  Building  on  Kendall's  results  in  2D,  we  are 
trying  to  answer  the  above  question  in  a  particular  case  involving  a  small 
number  of  point  features  in  the  plane  under  similarity  transformations 
which  are  projected  to  ID.  This  is  a  modified  radar  case  where  scale  is 


2.  3D  shape  reconstruction  from  motion. 

This  is  the  newest  aspect  of  our  work.  The  goal  is  to  improve  upon  the 
ideas  and  methods  of  Mark  Stuff  to  do  3D  target  geometry  reconstruction 
from  a  series  of  ID  radar  range  profiles  taken  of  a  target  moving  relative 
to  the  sensor.  We  have  been  working  with  a  simplified  2D  to  ID 
orthographic  model  which  captures  the  essence  of  the  problem.  The  central 
issue  is  how  to  find  the  object  that  best  fits  the  data  provided  by  the 
accumulated  set  of  noisy  images.  In  our  formulation  this  means  finding 
the  object  whose  image  locus  in  the  image  shape  space  best  fits  the  image 
data.  The  key  is  what  is  meant  by  "best  fit."  We  argue  that  the  natural 
Riemannian  metric  on  the  image  shape  space  is  the  best  measure  to  use  in 
determining  "goodness  of  fit,"  and  we  are  attempting  to  design  an  optimal 
fitting  procedure  based  on  this  idea. 


3.  Global  descriptions  of  shape  spaces  in  the  orthographic  (radar)  case. 

In  order  to  carry  out  our  program  of  developing  the  global  version  of  the 
object/image  equations  and  object/image  metrics  for  the  orthographic  case 
it  is  necessary  to  understand  how  the  shape  spaces  for  points  features  in 
this  case  isometrically  embed  in  standard  Euclidean  space.  For  small 
numbers  of  point  features  in  ID  this  is  relatively  easy,  but  for  greater 
numbers  of  points  in  ID  and  any  number  in  2D  or  3D,  this  becomes  a  harder 
problem.  It  essentially  amounts  to  finding  an  embedding  of  real  or 
complex  projective  space  isometrically  into  a  Euclidean  space  (real  or 
complex)  of  as  low  a  dimensional  as  possible,  and  then  extending  this 
embedding  to  a  certain  cone  over  the  projective  space.  We  have  been  able 
to  do  this,  paving  the  way  for  the  full  development  of  our  approach  to 
recognition  in  the  radar  case. 


4.  Testing  our  algorithms  and  new  applications. 

Work  on  designing  and  implementing  experiments  to  test  several  recognition 
algorithms  based  on  our  object/image  metrics  was  carried  out  during  the 
course  of  the  project.  The  results  were  summarized  in  our  paper, 
"Robustness  and  statistical  analysis  of  object/image  metrics,"  which  was 
presented  at  Electronic  Imaging,  Vision  Geometry  XIV,  in  San  Jose,  CA,  in 
January  2006.  The  results  showed  that  for  point  features  in  the  weak 
perspective  case,  our  object/image  metric  performs  surprisingly  well  even 
in  the  face  of  sensor  noise.  The  results  also  scaled  well  with  respect  to 
the  size  of  the  object  database  and  showed  the  expected  strong  increase  in 
target  matching  performance  with  each  additional  feature  point  considered. 
In  addition,  we  have  recently  been  collaborating  with  Ms.  Olga  Mendoza,  a 
young  researcher  at  AFRL,  Wright  -Patterson  AFB,  who  has  performed 
additional  tests  of  the  algorithms,  and  who  has  an  interest  in  applying 
the  object/image  metrics  to  problems  in  image  registration  and  tracking. 


^  ' 

5.  The  Full  Perspective  Case  -  0/1  Equations  and  Metrics 

*  i 

In  joint  work  with  our  Ph.D.  student  Kevin  Abbott  we  have  made  significant 
progress  in  the  very  difficult  case  of  full  perspective  projection 
(essentially  the  pin-hole  camera  model  of  projection)  which  is  important 
for  recognizing  objects  in  optical  images.  The  central  difficulties  in 
this  case  are  that  the  shape  spaces  are  not  well  understood  and  that  the 
computational  complexity  increases  dramatically  when  dealing  with 
projective  invariance.  We  were  able  to  make  significant  progress  this 
past  year,  introducing  the  first  true  global  version  of  the  object/image 
equations  in  the  full  perspective  case,  and  the  first  metrics  fully 
invariant  to  projective/perspective  transformations.  These  results  appear 
in  our  paper  "Recognizing  Point  Configurations  in  Full  Perspective,"  and 
in  greater  detail  in  Kevin  Abbott's  Ph.D.  thesis. 

6.  Recognizing  configurations  of  linear  features  in  the  generalized  weak 
perspective  case. 

Very  little  previous  work  has  been  done  on  the  shape  theory  of  line 
configurations.  In  the  course  of  this  project  we  carried  out  an 
investigation  of  the  problem  of  single-view  recognition  for  sets  of  line 
features  under  generalized  weak  perspective  projection.  In  particular  we 
derived  the  object/image  equations  for  projection  from  2D  to  3D. 
Unfortunately  because  this  required  using  the  Plucker  coordinates  of  the 
lines,  we  had  to  fall  back  on  standard  position  methods  to  define  our 
invariants,  meaning  that  the  results  require  certain  general  position 
assumptions  that  we  would  eventually  like  to  eliminate.  In  addition,  this 
work  on  the  generalized  weak  perspective  case  has  revealed  an  approach  to 
the  orthographic  (radar)  case  which  we  hope  to  flesh  out  shortly.  Our 
results  appear  in  a  paper  entitled  "Recognition  of  Configurations  of  Lines 
I  —  Weak  Perspective  Case"  which  was  published  in  late  2005. 

7.  Comparison  between  shape  in  the  similarity  case  and  shape  in  the 
affine  case. 

The  definition  and  study  of  shape  spaces  for  the  similarity  group  began 
with  Kendall  in  1977.  He  treated  ordered  k-tuples  of  points  in  Euclidean 
m-space  (not  all  the  same  point).  Two  such  k-tuples  of  feature  points  (or 
"landmarks")  are  deemed  equivalent  if  they  differ  by  a  similarity 
transformation  (i.e.  rotation,  translation,  and/or  positive  scale).  The 
resulting  equivalence  class  is  known  as  the  "shape"  of  the  configuration 
of  the  k  feature  points.  The  geometry  of  these  spaces  has  been  studied 
by  many  authors.  In  previous  work  we  examined  the  shape  spaces  for  the 
larger  affine  group  and  explored  the  relationship  between  the  shape  of  a 
configuration  of  points  in  three  dimensions  and  the  shapes  of  all  the 
images  of  that  configuration  in  two  dimensions  under  all  possible  (affine) 
generalized  weak  perspective  projections.  This  leads  to  the  notion  of 
the  object/image  equations  which  quantify  the  relationship  between  3D 
object  features  (points)  and  2D  image  features.  They  are  2ero  if  and 
only  if  a  generalized  weak  perspective  projection  exists  which  takes  the 
3D  data  to  the  2D  data.  The  geometry  in  this  case  is  particularly  nice, 
relating  as  it  does  to  properties  of  Grassmann  manifolds.  Also  the 


natural  metric  geometry,  both  in  the  classical  similarity  case 
{Procrustes  metric)  and  the  affine  case  {Fubini — Study  metric),  provides 
a  way  to  measure  the  distance  between  shapes,  both  object  shapes  and 
image  shapes,  as  well  as  providing  a  natural  notion  of  distance  {i.e. 
matching)  between  an  object  shape  and  an  image  shape. 

In  our  initial  work  under  this  grant,  we  sought  to  gain  a  clearer 
understanding  of  the  various  notions  of  shape  (i.e.  shape  for  different 
transformation  groups  acting  on  the  feature  points).  The  first  problem 
we  considered  was  the  relationship  between  the  shape  spaces  of  Kendall 
for  the  action  of  similarity  transformations,  consisting  of  rotations, 
translations  and  scale  (no  reflections  for  the  moment)  and  the  shape 
spaces  for  the  action  of  the  affine  group.  We  were  able  to  make  a  complete 
analysis  of  this  situation.  The  primary  discovery  was  the  rather  simple 
and  elegant  form  of  the  map  taking  you  from  similarity  shape  to  affine 
shape  and  the  complete  analysis  of  the  locus  of  degenerate  similarity 
shapes  which  leads  to  some  interesting  topological  issues. 

The  results  of  this  work  appear  in  our  paper  "The  Relationship  Between 
Shape  under  Similarity  Transformations  and  Shape  under  Affine 
Transformations,"  a  copy  of  which  was  attached  to  one  of  our  previous 
reports . 


8.  Invariants  and  Shape  Characterizations  for  Unordered  Feature  Points 

We  have  begun  investigating  extensions  of  our  methods  to  the  difficult 
case  of  unordered  feature  points.  Here  we  can  make  use  of  new  work  of 
Boutin  and  Kemper,  "On  Reconstructing  Configurations  of  Points  in  the 
Projective  Plane  from  a  Joint  Distribution  of  Invariants,"  preprint, 

April  2004.  This  paper  provides  a  complete  description  of  the 
invariants  for  unordered  point  features  in  an  image  under  full 
perspective.  Our  hope  is  to  extend  this  to  object  features  in  3D  and  then 
combine  it  with  the  projection  from  3D  to  2D  to  obtain  invariant  matching 
equations  (object/image  equations)  that  are  also  permutation  invariant. 
This  would  in  turn  hopefully  lead  to  permutation  invariant  metrics  for 
point  feature  object/image  comparisons  in  the  full  perspective  case. 


9 .  The  Weak  Perspective  Case  for  Point  Features 

This  material  has  been  reported  on  previously  and  was  recently  written  up 
in  an  invited  book  chapter  (see  below)  published  by  Birkhauser. 


Characteristics  of  our  Results 

Below  is  a  brief  outline  in  bullet  form  of  the  principle  characteristics 
of  our  approach  to  using  object/image  metrics  for  sensor  exploitation  and 
target  identification. 


+  Is  a  Feature  Based  Approach  to  Target  Identification 

*  t 

The  approach  makes  use  of  small  numbers  of  sensed  features  associated  with 
features  in  the  target  geometry. 

+  Invariance  to  Pose  and  Scale 

The  method  allows  identification  to  be  achieved  across  all  poses  and,  if 
desired,  at  varying  scales  without  resorting  to  exhaustive  template 
matching. 

+  Based  on  Intrinsic  Measures  of  Shape 

To  achieve  invariance  the  method  makes  use  of  the  emerging  mathematical 
theory  of  shape  that  characterizes  internal  relationships  among  features, 
independent  of  relevant  transformations  like  rotation,  translation,  and 
scale.  These  characterizations  turn  out  to  also  be  independent  of  the 
coordinate  system  used  to  record  the  target  or  image  feature  locations. 

+  Permits  an  Invariant  Characterizations  of  the  Fundamental  Matching 
Criteria  known  as  the  Object/Image  Equations 

We  can  express  the  necessary  and  sufficient  conditions  for  the  invariant 
shape  of  a  set  of  target  features  to  be  consistent  with  the  invariant 
shape  of  a  set  of  image  features  as  a  set  of  equations  in  local  or  global 
coordinates  on  the  space  of  object  shapes  and  image  shapes.  Consistency 
in  this  case  means  that  there  exists  some  pose  of  the  object,  some  sensor 
location,  and  some  set  of  sensor  parameters  that  will  result  in  the  target 
features  projecting  through  the  sensor  to  the  observed  image  features. 
These  equations  are  called  object/image  equations. 

Since  the  input  to  the  (non-linear)  object/image  equations  are  a  set  of 
target  shape  coordinates  and  a  set  of  image  shape  coordinates,  we  can  use 
the  equations  to  invariantly  determine  matching  and  also  to  determine  all 
image  shapes  that  can  be  achieved  from  a  given  object  shape  or  all  object 
shapes  capable  of  producing  a  given  image  shape. 

+  Relies  on  Metric  Geometry 

The  methods  yield  intrinsic  metrics  on  the  "spaces  of  shapes".  These 
metrics  satisfy  the  usual  triangle  inequality  and  can  be  used  to  measure 
object  or  image  shape  similarity  (up  to  the  allowable  transformations, 
e.g.  rotation,  translation,  scale,  etc.)  In  addition  they  provide  a 
mechanism  for  effective  hashing  in  large  databases  of  target  or  image 
feature  sets. 

+-Yields  a  Natural  Measure  of  Matching  (Distance)  between  an  Object  Feature 
Set  and  an  Image  Feature  Set 

The  theory  provides  two  natural  "metrics"  for  invariantly  matching  a  given 
set  of  object  features  to  a  given  set  of  image  features.  We  can  compute 
in  object  space,  using  the  metric  distance  between  object  shapes,  by 
finding  the  minimum  distance  between  the  given  object  and  all  objects 


capable  of  producing  the  given  image.  The  alternative  is  to  work  in  image 
s^ace,'  using  the  metric  in  that  space,  to  compute  the  minimum  distance 
between  the  given  image  and  all  images  of  the  given  object.  A  deep 
duality  theorem  assures  that,  with  suitable  normalization,  these  two 
metrics  are  the  same.  This  means  that  for  any  particular  sensor  type 
amenable  to  this  approach,  there  is  a  unique  natural  pose  and  scale 
invariant  measure  of  object/image  closeness  of  match  I 

+Amenable  to  Statistical  Analysis  and  ATR  Theory 

Shape  was  originally  introduced  for  the  purpose  of  doing  invariant 
statistical  analysis.  Recent  workshops  at  AIM  {American  Institute  for 
Mathematics),  IMA  (Institute  of  Mathematics  and  Its  Applications),  and 
SAMSI  (Statistical  and  Applied  Mathematical  Sciences  Institute)  dealing 
with  the  theme  of  shape  and  statistics  on  shape  manifolds,  points  to  the 
likely  development  of  new  techniques  to  do  more  sophisticated  statistical 
analysis  of  the  ATR  problem,  and  to  the  development  of  an  ATR  theory  to 
predict  optimal  system  performance.  Also  these  conferences  and  workshops 
point  to  a  wide  array  of  applications,  including  important  applications  of 
shape  and  shape  statistics  to  medical  images,  automated  inspection,  and 
image  segmentation. 

+  Computationally  Efficient 

While  the  description  of  shape  and  the  metrics  above  involve  some  rather 
sophisticated  mathematics,  in  many  instances  the  shape  coordinates  and 
metric  values  are  easily  and  directly  computable  via  simple  and  fast 
algorithms  involving  minimal  computational  resources. 

+  Maximally  Robust 

Because  the  metrics  are  based  on  shape,  they  are  in  some  sense  the  "best 
possible"  matching  criteria  as  far  as  target  geometry  is  concerned.  As  a 
result  the  metrics  should  be  maximally  robust  to  sensor  error, 
pixelization,  or  small  target  variations.  Our  recent  studies  with  large 
synthetic  databases  of  feature  sets  bear  this  out.  Additional  tests  have 
been  performed  by  Ms.  Olga  Mendoza  at  AFRL,  Wright-Patterson,  AFB  and  by 
researchers  at  the  University  of  Illinois. 


Personnel  Supported : 

In  addition  to  the  principal  investigator,  the  project  has  provided 
support  for  two  graduate  research  assistants:  Mr.  Kevin  Abbott  and  Ms. 
Jennifer  Snodgrass,  both  graduate  students  in  the  Mathematics  Department 
at  Texas  ASM  University. 

Ms.  Snodgrass  was  engaged  in  the  coding  and  testing  of  a  number  of 
algorithms  and  in  the  design  of  computational  experiments  to  verify 
various  theoretical  ideas  that  emerged  during  the  course  of  our  research. 
Ms.  Snodgrass,  who  received  her  Bachelor’s  degree  in  Applied  Mathematics 
from  Rice  University,  completed  work  on  her  Master's  degree  in  May  2005. 


Mr.  Abbott,  a  Ph.D.  student  in  Mathematics,  became  involved  in  the  project 
as  a  result  of  a  graduate  course  in  Shape  Theory  offered  by  the 
P.I.,  Dr.  Stiller,  in  the  Spring  of  2004.  This  course  presented  the 
results  of  earlier  AFOSR  sponsored  research  along  with  background  material 
in  differential  geometry  and  statistical  shape  theory.  Mr.  Abbott 
recently  completed  his  Ph.D.  dissertation  on  algebro-geometric  aspects  of 
shape  theory  in  the  full  perspective  case.  He  graduated  last  month 
(August  2007)  and  has  taken  a  job  with  Metron  Corp.  in  Arlington, 

Virginia.  Mr.  Abbott  was  involved  with  several  aspects  of  our 
collaboration  with  researchers  at  the  Air  Force  Research  Lab  and 
accompanied  the  P.I.  on  visits  to  Wright  Patterson  Air  Force  Base  in  2006. 

Faculty:  Dr.  Peter  F.  Stiller,  Prof,  of  Mathematics  and  Computer  Science 

Graduate  Students:  Jennifer  Snodgrass,  Kevin  Abbott 


Publications : 

Several  publications  dealing  with  this  project's  results  have  or  will 
appear  in  print  shortly.  A  copy  of  the  two  most  recent  are  attached  to 
this  report.  The  others  were  appended  to  our  previous  reports  or  are  in 
the  midst  of  the  publication  process. 

D.  Gregory  Arnold,  Olga  Medoza,  and  Peter  F.  Stiller,  "Image  Registration 
via  Invariant  Object/Image  Equations  and  O/I-Metrics , "  Algorithms  for 
Synthetic  Aperture  Radar  Imagery  XV,  SPIE  Defense  and  Security  Symposium, 
Orlando,  FL,  3/08,  to  appear. 

Arnold,  G. ,  Stiller,  P.  F.,  and  Sturtz,  K.,  "Geometric  Methods  for  ATR  — 
Invariants,  Object/Image  Equations,  and  Metrics,"  under  revision  for 
publication,  AFRL  Technical  Report,  45  pages  (2007). 

Stiller,  P.  F.  and  Abbott,  K.,  "Recognizing  Point  Configurations  in 
Full  Perspective,"  Electronic  Imaging,  Vision  Geometry  XV,  Vol .  6499,  San 
Jose,  CA,  12  pages  (2007). 

Stiller,  P.  F.  and  Arnold,  D.  G.,  "Mathematical  Aspects  of  Shape 
Analysis  for  Object  Recognition,"  Electronic  Imaging,  Visual 
Communications  and  Image  Processing,  Vol,  6508,  San  Jose,  CA,  12  pages 
(2007)  . 

Stiller,  P.  F.,  "Robustness  and  statistical  analysis  of  object/image 
metrics,"  Electronic  Imaging,  Vision  Geometry  XIV,  Vol.  6066,  San  Jose, 
CA,  1/06 ,  9  pages  ( 2006 ) . 

Arnold,  G.,  Stiller,  P.  F.,  and  Sturtz,  K.,  "Object-Image  Metrics  for 
Generalized  Weak  Perspective  Projection,"  chapter  in  Statistics  and 
Analysis  of  Shapes,  Editors  Hamid  Krim  and  Anthony  Yezzi,  Jr., 

Birkhauser,  pp.  253-279  (2006). 


Stiller,  P.  F„,  "Recognition  of  Configurations  of  Lines  I  —  Weak 
Perspective  Case,"  Proceedings  SPIE  Int'l  Symposium  on  Optical  Science 
and  Technology,  Mathematical  Methods  in  Pattern  and  Image  Analysis,  Vol. 
5916,  Jaako  Astola,  Editor,  San  Diego,  CA,  8/05,  13  pages  (2005). 

Stiller,  P.  F.,  "The  Relationship  Between  Shape  under  Similarity 
Transformations  and  Shape  under  Affine  Transformations, H  Proceedings  SPIE 
Int'l  Symposium  on  Optical  Science  and  Technology,  Mathematics  of 
Data/Image  Coding,  Compression,  and  Encryption,  with  Applications, 

Vol.  5561,  Mark  Schmalz,  Editor,  Denver,  CO,  8/04,  pp.  108-116  (2004). 

Stiller,  P.  F.,  "Vision  metrics  and  object/image  relations  II: 
Discrimination  metrics  and  object/image  duality,"  Electronic  Imaging, 
Vision  Geometry  XII,  Vol.  5300,  San  Jose,  CA,  pp.  74-85  (2004). 


Interactions/Transitions : 

In  June  2004,  Dr.  Stiller  visited  the  Air  Force  Research  Laboratory’s 
Target  Recognition  Branch  AFRL/SNAT  where  plans  for  collaborative  work 
were  made  and  several  of  the  topics  in  the  proposal  were  discussed. 

In  August  2004,  Dr.  Stiller  attended  the  SPIE  International  Conference  on 
Optical  Science  and  Technology  in  Denver  for  the  conference  on  Mathematics 
of  Data/ Image  Encoding,  Compression,  and  Encryption  VI,  with  Applications. 
He  presented  a  paper  entitled  "The  Relationship  Between  Shape  under 
Similarity  Transformations  and  Shape  under  Affine  Transformations."  At 
the  meeting  Dr.  Stiller  continued  discussions  with  Dr.  Mark  Schmaltz  of 
Florida  State  University  on  possible  novel  applications  of  our  research  on 
metrics  for  object  recognition  to  the  completely  different  problem  of 
evaluating  data  compression  and  encryption  schemes . 

Also  in  August  2004,  Dr.  Stiller  visited  Vexcel  Corporation  in  Boulder, 
Colorado  and  presented  a  talk  entitled  "Shape  Theory  and  Invariant  Metrics 
for  Object  and  Target  Recognition."  His  visit  was  hosted  by  Dr.  Carolyn 
Johnston.  Dr.  Stiller  was  originally  put  in  contact  with  Dr.  Johnston 
several  years  ago  by  Dr.  Arje  Nachman  of  the  Air  Force  Office  of 
Scientific  Research. 

From  January  17,  2005  to  January  23,  2005  Dr.  Stiller  again  visited  the 
Air  Force  Research  Laboratory's  Target  Recognition  Branch  AFRL/SNAT  to 
continue  his  research  collaboration  with  Dr.  Greg  Arnold.  Dr.  Stiller 
returned  to  AFRL/SNAT  in  May  and  June  of  2005.  During  that  visit,  work 
on  the  weak  perspective  case  for  point  features  was  completed  and  written 
up  in  an  invited  book  chapter  entitled  "Object-Image  Metrics  for 
Generalized  Weak  Perspective  Projection"  which  has  now  appeared  in  a 
volume  entitled  Statistics  and  Analysis  of  Shapes ,  edited  by  Professor 
Hamid  Krim  of  North  Carolina  State  University. 

In  May  2005,  Dr.  Stiller  attended  the  AFOSR  Program  Review  at  North 
Carolina  State  University  hosted  by  Dr.  Jon  Sjogren,  AFOSR  and  Professor 
Hamid  Krim,  NC  State.  Dr.  Stiller  spoke  on  "Shape,  Shape  Matching  Metrics, 
and  Learning  Shape  by  Sampling  (Shapelets)"  jointly  with  Dr.  Greg  Arnold, 
AFRL/SNAT. 


I  » 

While  visting  AFRL's  Target  Recognition  Branch  AFRL/SNAT  in  June  2005,  Dr. 
S’tillefr  held  a  number  of  discussions  with  Mr.  Ron  Dilsavor  of  SET 
Associates,  Inc.  concerning  ways  to  use  this  project's  results  to 
recognize  objects  in  SAR  images. 

In  August  2005,  Dr.  Stiller  attended  the  SPIE  International  Conference  on 
Optical  Science  and  Technology  in  San  Diego  for  the  conference  on 
Mathematical  Methods  in  Pattern  and  Image  Analysis .  He  presented  a  paper 
entitled  "  Recognition  of  Configurations  of  Lines  I  —  Weak  Perspective 
Case. " 

Dr.  Stiller  was  an  invited  participant  in  the  IMA  Workshop  on  New 
Mathematics  and  Algorithms  for  3-D  Image  Analysis,  at  the  Institute  for 
Mathematics  and  its  Applications,  University  of  Minnesota,  Minneapolis, 

MN,  Jan.  9-12,  2006.  Several  researchers  from  ARFL  also  attended, 
including  Dr.  Greg  Arnold,  AFRL/SNAT  and  Ms.  Olga  Mendoza,  a  recent  hire 
at  AFRL .  During  the  workshop.  Dr.  Stiller  and  Dr.  Arnold  held  discussions 
with  Dr.  Guillermo  Sapiro,  Department  of  Electrical  and  Computer 
Engineering,  University  of  Minnesota,  concerning  ideas  for  point  cloud 
matching  and  with  Professor  Peter  Olver  concerning  aspects  of  differential 
invariants . 

Dr.  Stiller  returned  to  the  Institute  for  Mathematics  and  its 
Applications,  University  of  Minnesota,  in  April  2006  to  attend  the 
Workshop  on  Shape  Spaces  (April  3—7,  2004)  organized  by  Professor  David 
Mumford.  Joint  with  Dr.  Arnold,  AFRL/SNAT,  Dr.  Stiller  held  discussions 
with  Dr.  T.  J.  Klausutis,  AFRL,  Eglin,  AFB,  who  also  attended  the 
workshop.  These  discussions  concerned  applications  of  shape  theoretic 
techniques  to  various  Air  Force  target  recognition  problems. 

In  May  2006,  Dr.  Stiller  visited  Dr.  Arnold  at  the  Air  Force  Research 
Laboratory's  Target  Recognition  Branch  AFRL/SNAT.  The  purpose  was  to 
engage  in  collaborative  research  on  a  number  of  problems  including  the 
3D  reconstruction  from  motion  problem  and  various  shape  statistics 
problems.  Dr.  Stiller  also  worked  with  two  graduate  students  visiting 
AFRL/SNAT  for  the  summer.  During  this  visit  Dr.  Arnold  and  Dr.  Stiller 
traveled  to  Purdue  University  to  speak  with  Dr.  Mirelle  Boutin  (mentioned 
above)  about  her  work  on  invariants  for  unordered  point  features  and  to 
give  a  joint  talk  in  the  Electrical  Engineering  Department.  This  resulted 
in  Dr.  Arnold  and  Dr.  Stiller  being  invited  by  Prof.  Boutin  to  submit  a 
paper  to  the  SPIE  conference  on  Electronic  Imaging,  Visual  Communications 
in  Image  Processing  which  was  held  during  January  2007  in  San  Jose. 

In  August  2006  Dr  Stiller  returned  to  Wright  Patterson  AFB  to  again 
coordinate  research  efforts  with  Dr.  Arnold  and  to  attend  the  Multi-Modal 
Biometrics  Workshop  hosted  jointly  by  the  Human  Effectiveness  Biosciences 
and  Protection  Division  and  the  Sensors  ATR  Division  of  AFRL  at  Wright 
Patterson  AFB.  The  goal  was  to  exchange  ideas  on  recognition  and 
identification  technologies.  It  was  an  opportunity  for  us  to  explore 
applications  of  our  recognition  techniques  to  biometric  problems  such  as 
face/body  recognition  and  gait  analysis. 


On  30  January  to  1  February  2007  Dr.  Stiller  attended  SPIE's  Conference  on 
Electronic  Imaging  held  in  San  Jose,  CA  to  present  two  papers.  The  first 
paper  "Recognizing  Point  Configurations  in  Full  Perspective"  was  joint 
with  his  Ph.D.  student  Kevin  Abbott  and  was  presented  in  Vision  Geometry 
XIV.  Dr.  Stiller  chaired  the  session  on  Surface  Analysis  and 
Reconstruction  in  that  conference.  The  second  paper  "Mathematical  Aspects 
of  Shape  Analysis  for  Object  Recognition"  was  an  invited  paper  for  the 
session  on  Visual  Communications  and  Image  Processing.  One  important 
research  contact  to  come  out  of  this  meeting  was  a  series  of  discussions 
with  the  3D  TV  group  at  Phillips  Electronics.  They  are  interested  in 
using  our  techniques  as  a  tool  for  adding  depth  information  to  existing 
video  content.  In  addition,  we  learned  that  researchers  at  the  University 
of  Illinois  are  using  our  Object/Image  metric  for  the  affine  case  in  a 
number  of  computer  vision  experiments. 

From  March  3rd  to  March  7th  2007  Dr.  Stiller  participated  in  the  workshop  on 
New  Directions  in  Complex  Data  Analysis  for  Emerging  Applications  that  was 
held  under  the  sponsorship  of  AFOSR  and  NSF  in  Breckenridge,  Colorado.  In 
addition  to  giving  a  brief  presentation  entitled  "Algebraic  Geometry, 

Shape,  and  Understanding  Configurations  from  Projections  to  Lower 
Dimensions  with  Applications  to  Object  Recognition  and  Image 
Understanding,"  Dr.  Stiller  participated  in  various  panel  discussions. 
While  at  the  workshop.  Dr.  Stiller  began  discussions  with  Dr.  Louis  Scharf 
on  a  geometric  approach  to  a  long  standing  problem  in  signal  processing. 
This  problem  can  be  reinterpreted  as  minimizing  a  distance  in  a 
Grassmannian  between  two  subvarieties,  one  of  which  comes  from  the  k- 
secants  of  a  rational  normal  curve  and  the  other  of  which  is  a  standard 
Schubert  cycle. 

After  the  completion  of  the  Spring  semester  in  May  2007,  Dr.  Stiller  made 
another  visit  to  Wright  Patterson  AFB  to  again  coordinate  research  efforts 
with  Dr.  Arnold.  The  focus  was  on  updating  and  expanding  our  joint  paper 
"Geometric  Methods  for  ATR  -  Invariants,  Object/Image  Equations,  and 
Metrics"  for  publication.  In  addition  we  continued  discussions  with  Dr. 
Matt  Ferrara  at  AFRL  on  3D  target  reconstruction  from  multiple  ID  radar 
range  profiles. 

On  June  21Bt  and  22nd  2007  Dr.  Stiller  attended  the  AFOSR  Sensing  Program 
Review  at  Harvard  University.  He  spoke  on  "Shape,  Shape  Statistics,  and 
Reconstruction. " 

Dr  Stiller  was  an  invited  attendee  at  the  SAMSI  Summer  Program  on  the 
Geometry  and  Statistics  of  Shape.  This  program  ran  from  July  7th  through 
July  13th,  2007  at  the  Statistical  and  Applied  Mathematical  Sciences 
Institute  (SAMSI)  in  Research  Triangle  Park,  NC. 

In  August  2007  Dr.  Stiller  returned  to  AFRL,  Wright— Patterson  to  continue 
his  collaboration  with  researchers  there.  In  addition  a  new  collaborative 
effort  was  begun  with  Ms.  Olga  Mendoza  (AFRL/SNAT)  dealing  with 
applications  of  our  object/image  metrics  in  the  affine  case  to  image 
registration  problems. 


New  Discoveries,  Inventions,  or  Patent  Disclosures: 

Beyond  the  research  results  discussed  above,  there  are  no  new  discoveries, 
inventions,  or  patent  disclosures. 


Person  completing  this  report: 

Dr.  Peter  F.  Stiller 

Professor  of  Mathematics  and  Computer  Science 

Associate  Director  of  the  Institute  for  Scientific  Computation 
Phone:  (409)  862-2905 
Fax:  (409)  845-5827 
Date:  30  September  2007 


Attachments : 

1)  Summary  of  our  talk  at  the  Breckenridge  workshop. 

2)  Abstract  of  our  paper,  "Image  Registration  via  Invariant  Object/Image 
Equations  and  O/I-Metrics . " 

3)  Copy  of  our  slides  from  the  Sensing  Program  Review  at  Harvard. 

4)  Copy  of  our  recent  paper  "Recognizing  Point  Configurations  in  Full 
Perspective"  joint  with  Kevin  Abbott. 

5)  Copy  of  our  recent  paper  "Mathematical  Aspects  of  Shape  Analysis  for 
Object  Recognition"  joint  with  D.  Gregory  Arnold  (AFRL). 


Note:  3)  —  5)  provided  as  separate  files  in  the  electronic  version  of 
this  report. 


Attachment  #1 


New  Directions  in  Complex  Data  Analysis 
for  Emerging  Applications 

Breckenridge,  Colorado 
March  4-7,  2007 


Summary  of  Talk:  "Algebraic  Geometry,  Shape,  and  Understanding 
Configurations  from  Projections  to  Lower  Dimensions  with  Applications  to 
Object  Recognition  and  Image  Understanding"  by  Dr.  Peter  F.  Stiller, 
Professor  of  Mathematics  and  Computer  Science,  Associate  Director 
Institute  for  Scientific  Computation,  Texas  A&M  University. 

Efficiently  recognizing  three  dimensional  arrangements  of  features  on 
an  object  from  a  single  two  dimensional  view  requires  an  approach  that  is 
view  and  pose  invariant.  Existing  methods  often  rely  on  computationally 
expensive  template  matching.  Those  methods  use  comparisons  against 
templates  created  for  all  possible  views;  with  the  infinite  number  of 
possibilities  being  approximated  by  some  finite  number  of  views.  To  carry 
out  an  invariant  approach  to  target  recognition,  we  need  to  exploit 
properties  and  relationships  that  are  geometrically  intrinsic  to  the 
objects  and/or  images  being  compared. 

Our  approach  to  view  and  pose  independence  (as  well  as  coordinate 
independence)  starts  with  a  characterization  of  a  configuration  of 
features  by  its  geometric  invariants.  The  specific  group  to  which  things 
should  be  invariant  is  a  function  of  the  sensor  type.  We  then  derive  a 
fundamental  set  of  equations  that  express,  in  an  invariant  way,  the 
relationship  between  the  3D  geometry  and  its  "residual"  in  a  2D  (or  ID) 
image.  These  equations  completely  and  invariantly  describe  the  mutual 
3D/2D  constraints.  Once  derived,  they  can  be  exploited  in  a  number  of 
ways.  For  example,  from  a  given  2D  configuration,  we  are  able  to 
determine  a  set  of  nonlinear  constraints  on  the  geometric  invariants  of 
the  3D  configurations  capable  of  producing  that  given  2D  configuration, 
and  thereby  arrive  at  a  test  for  determining  the  object  being  viewed. 
Conversely,  given  a  3D  geometric  configuration  (features  on  an  object),  we 
are  able  to  find  a  set  of  equations  that  constrain  the  invariants  of  the 
images  of  that  object;  helping  to  determine  if  that  object  appears  in 
selected  images.  With  these  results  in  hand,  we  plan  in  future  work  to 
focus  on  three  major  problems:  1)  object/image  metrics  on  shape  spaces  to 
provide  a  distance  (difference)  between  two  object  configurations,  two 
image  configurations,  or  an  object  and  an  image  pair  in  pose  invariant, 
coordinate  free  terms,  2)  reconstruction  of  an  object's  3D  shape  from  2D 
sensed  information,  either  from  multiple  sensors  or  multiple  images  of  a 
moving  object,  3)  statistical  issues  surrounding  random  shapes, 
distributions  of  shapes,  and  noise  in  object  recognition. 

Issues  and  Collaborations  Arising  From  the  Conference:  One  topic  that  arose 
in  several  of  the  presentations,  was  the  issue  of  dealing  with  data  on 
certain  manifolds,  most  notably  Grassmann  manifolds.  In  the  work  of 
Peterson,  Kirby  and  in  my  own  work  complex  image  data  is  represented  by 


data  points  in  a  Grassmannian.  Appropriate  metrics  and  also  procedures 
for  fitting  subvarieties  to  such  data  need  to  be  developed.  The  general 
question  of  invariant  features  of  high  dimensional  data  under  projections 
to  lower  dimensions  is  also  an  interesting  one.  It  appears  that  some 
aspects  of  our  techniques  could  be  applied  to  such  high  dimensional 
problems.  Finally,  an  interesting  signal  processing  problem,  introduced 
to  the  author  by  Louis  Scharf  at  the  meeting,  appears  to  have  a  nice 
geometric  formulation  in  terms  of  secant  varieties  of  rational  normal 
curves,  where  the  same  sort  of  metrics  on  Grassmannians  play  a  role  in 
finding  the  optimal  answer.  We  are  currently  investigating  this. 
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Image  Registration  via  Invariant  Object /Image  Equations 

and  O/ I -Metrics 

By  D *  Gregory  Arnold,  Olga  Mendoza,  and  Peter  F.  Stiller 


The  problem  of  single-view  recognition  is  central  to  many  target  recognition  and  computer 
vision  tasks .  Understanding  how  information  available  in  a  single  image  of  an  object  or 
scene,  be  it  an  optical  image,  a  SAR  image,  or  a  radar  range  profile,  relates  to  the 
target  object's  or  scene's  geometry  is  a  key  step  in  building  reliable  identification 
algorithms.  Likewise  such  knowledge  is  critical  to  understanding  how  two  different 
images  of  the  same  object  or  scene  are  related.  For  example,  without  a  priori  knowledge 
of  a  sensor's  viewpoint,  an  object's  pose,  or  a  sensor's  parameters,  it  is  difficult  to 
efficiently  recognize  a  three-dimensional  arrangement  of  features  (such  as  a  geometric 
configuration  of  lines  and/or  points)  on  an  object  or  to  efficiently  register  two  images 
of  the  same  object  or  scene.  What  is  needed  is  an  approach  that  is  invariant  to  changing 
viewpoints,  adjustments  in  the  sensor  parameters,  or  variations  in  the  pose. 

In  recent  work  the  authors  have  developed  such  an  approach  to  object  recognition,  and  the 
goal  of  this  paper  is  to  apply  the  same  techniques  to  the  registration  problem.  To  carry 
out  their  recognition  work,  they  started  with  a  characterization  of  a  configuration  of 
features  by  its  geometric  invariants.  The  specific  transformation  group  to  which  things 
needed  to  be  invariant  was  a  function  of  the  sensor  type.  They  then  derived  a 
fundamental  set  of  equations  that  expressed,  in  an  invariant  way,  the  relationship 
between  the  3D  geometry  and  its  ‘'residual"  in  a  2D  image*  These  equations  completely  and 
invariantly  described  the  mutual  3D/2D  constraints.  Once  derived,  the  equations  could  be 
exploited  in  a  number  of  ways.  For  example,  from  a  given  2D  configuration,  they  could 
determine  a  set  of  nonlinear  constraints  on  the  geometric  invariants  of  the  3D 
configurations  capable  of  producing  that  given  2D  configuration,  and  thereby  arrive  at  a 
test  for  determining  the  object  being  viewed.  Here  having  two  images  of  the  same  3D 
configuration  would  add  additional  constraints  and  tell  you  a  fair  amount  about  the 
relationship  between  the  two  images  —  thereby  assisting  with  the  registration  of  those 
images.  That  is  something  we  take  up  in  this  paper.  Conversely,  given  a  3D  geometric 
configuration  (features  on  an  object),  a  set  of  equations  that  constrain  the  invariants 
of  the  images  of  that  object  were  derived;  helping  to  determine  if  that  object  appears  in 
selected  images.  These  equations  also  play  a  role  in  registration  of  different  images 
of  the  same  scene  or  object.  They  give  us  an  understanding  of  the  locus  of  all  images 
and  the  flow  from  image  to  image  as  the  sensor  moves.  We  discuss  applications  of  this  in 
the  paper.  Finally,  the  authors  have  developed  certain  natural  invariant  metrics  (called 
Ol-metrics)  on  the  relevant  shape  spaces.  Thes  metrics  provide  a  distance  (difference) 
measure  between  two  object  configurations  or  two  image  configurations  and  express  the 
distance  (failure  to  match)  between,  say,  an  image-image  pair.  These  metrics  are  pose 
and  view  invariant  and  can  be  expressed  in  coordinate  free  terms. 

For  example,  consider  the  generalized  weak  perspective  model  of  image  formation,  which  is 
appropriate  to  optical  images  when  the  object  or  scene  in  the  far  field.  Here  the 
relevant  invariance  is  to  the  affine  group  of  transformations.  In  this  case  the  Ql- 
metric  for  images  will  measure  the  failure  of  two  images  to  differ  by  an  affine 
transformation.  As  such,  it  provides  a  quantification  of  the  drift  phenomenon  seen  in 
image  registration  done  via  affine  mappings. 

By  understanding  the  contribution  of  a  single  image  toward  the  recognition  or  recovery  of 
the  geometry /shape  of  the  object  or  scene  for  different  sensors,  it  will  be  easier  to 
develop  methods  to  integrate  information  from  multiple  images  taken  by  uncalibrated, 
distributed  sensors  of  varying  types,  or  to  make  use  of  a  series  of  images  taken  by  a 
single  sensor  of  a  moving  object,  we  investigate  in  this  paper  how  to  apply  our 
invariant  techniques  to  the  problem  of  registering  those  multiple  images* 


Shape,  Shape  Statistics,  and 


21  June  2007 


Motivation 

•  ”How  can  we  get  a  computer  to  determine 
what  object  produced  a  given  image?” 
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Basic  Idea 

•  We  represent  an  object  by  an  fc-tuple  of 
points  in  3-space  (object  configuration). 
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Configuration 


To  achieve  the  desired  invariance,  we  want 
to  match  shapes  of  object  configurations 
with  shapes  of  image  configurations  under 
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The  Affine  Shape  Spaces 

•  The  space  of  object  sha  pes  -  (R3)n/A//(3) 
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Shape  Coordinates 

Embed  Gr(k  -n-l,k)  into  pG-n-i)-1 
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The  Object/Image  Re¬ 
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All  images  of 
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We  may  compute  a  "distance"  between  an  ob¬ 
ject  shape  Kk~4  and  an  image  shape  Lk~ 3  in 
3  ways: 

•  Working  in  object  space 
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Object/Image  Metric  Duality 

Theorem  1.  For  an  object  shape  Kk~ 4  i 
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Shapes  of  configurations  of  k  points  are  in 
1-1  correspondence  with  varieties  V(P\, . . . , 
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the  conformal  case.  is  the  right  error  metric. 
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ABSTRACT 

In  this  paper  we  examine  two  fundamental  problems  related  to  object  recognition  for  point  features  under 
full  perspective  projection.  The  first  problem  involves  the  geometric  constraints  (object-image  equations)  that 
must  hold  between  a  set  of  object  feature  points  {object  configuration)  and  any  image  of  those  points  under 
a  full  perspective  projection,  which  is  just  a  pinhole  camera  model  for  image  formation.  These  constraints  are 
formulated  in  an  invariant  way,  so  that  object  pose,  image  orientation,  or  the  choice  of  coordinates  used  to 
express  the  feature  point  locations  either  on  the  object  or  in  the  image  are  irrelevant.  These  constraints  turn  out 
to  be  expressions  in  the  shape  coordinates  calculated  from  the  feature  point  coordinates.  The  second  problem 
concerns  the  notion  of  shape  and  a  description  of  the  resulting  shape  spaces.  These  spaces  aquire  certain  natural 
metrics,  but  the  metrics  are  often  hard  to  compute.  We  will  discuss  certain  cases  where  the  computations  are 
managable,  but  will  leave  the  general  case  to  a  future  paper. 

Taken  all  together,  the  results  in  this  paper  provide  a  way  to  understand  the  relationship  that  exists  between 
3D  geometry  and  its  “residual”  in  a  2D  image.  This  relationship  is  completely  characterized  {for  a  particular 
combination  of  features)  by  the  above  set  of  fundamental  equations  in  the  3D  and  2D  shape  coordinates.  The 
equations  can  be  used  to  test  for  the  geometric  consistency  between  an  object  and  an  image.  For  example,  by 
fixing  point  features  on  a  known  object,  we  get  constraints  on  the  2D  shape  coordinates  of  possible  images  of 
those  features.  Conversely,  if  we  have  specific  2D  features  in  an  image,  we  will  get  constraints  on  the  3D  shape 
coordinates  of  objects  with  feature  points  capable  of  producing  that  image.  This  yields  a  test  for  which  object  is 
being  viewed.  The  object-image  equations  are  thus  a  fundamental  tool  for  attacking  identification /recognition 
problems  in  computer  vision  and  automatic  target  recognition  applications. 

Keywords:  object  recognition,  full  perspective,  object- image  equations,  shape,  shape  coordinates. 


1-  A  REVIEW  OF  THE  AFFINE  CASE 

We  consider  r  points  in  space,  which  we  think  of  as  feature  points  on  some  object.  We  refer  to  this  set  of 
points  as  an  object  configiuution.  Next,  we  “take  a  picture”  of  the  object  by  choosing  a  plane  and  projecting 
these  feature  points  into  that  plane.  We  will  call  this  set  of  points  in  the  plane  an  image  configuration , 

In  this  section  we  will  (1)  identify  the  space  of  shapes,  which  are  configurations  modulo  the  action  of  a  certain 
group  of  transformations  on  Rn,  n  —  2,3,  and  give  global  coordinates  on  the  shape  space,  (2)  give  necessary  and 
sufficient  conditions  for  an  image  configuration  to  be  a  projection  of  an  object  configuration,  and  {3}  define  a 
natural  metric  on  the  shape  spaces.  For  additional  details  see  Arnold,  Stiller,  and  Sturtz  [I]. 


1.1.  The  Generalised  Weak  Perspective  Projection 


The  type  of  projections  we  will  consider  are  called  generalized  weak  perspective  projections.  If  we  represent  points 
in  RT1  (n  =  2  or  3)  in  the  form 

/  zi  \ 


p= 


X„ 

1  / 
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these  projections  as  maps  from  R3  to  Ra  take  the  form 


*12  *13  *14  \ 

*21  *22  *23  *14  ) 

0  0  0  1/ 

where  T  has  rank  3. 

Now  let  A  be  an  invertible  3x3  matrix  and  let  B  he  an  invertible  4x4  matrix.  It  turns  out  that  if  T  is  a 
generalized  weak  perspective  projection,  then  ATB  is  a  generalized  weak  perspective  projection  if  an  only  if  A 
and  B  are  affine  transformations  i.e.  ,4  and  B  take  the  form 

/  Cl  \ 

S  : 

\o  ...  0  \  ) 


where  S  €  G  L{n)  and  cj , . . .  ,  Cn  €  R. 

What  does  this  mean  in  terms  of  our  object  and  image  configurations?  Suppose  Q  €  R2  is  the  image  of  a 
point  P  G  R3  under  a  generalized  weak  perspective  projection  7\  i.e.  Q  —  TP,  Then  if  we  move  P  by  some 
affine  transformation  B  to  another  point  P*  and  if  we  move  Q  to  another  point  Qf  by  an  affine  transformation 
A%  we  will  have  that  Qf  —  ATI3~lP/.  As  a  result,  we  see  that  Qf  is  the  image  of  P*  under  the  generalized  weak 
perspective  projection  ATB'1  (since  A  and  B^1  are  both  affine  transformations). 

This  observation  shows  us  that  by  choosing  to  consider  generalized  weak  perspective  projections,  the  best 
that  we  can  hope  to  do  is  relate  object  configurations  to  image  configurations  up  to  affine  transformations. 

1.2.  The  Affine  Shape  Spaces 

As  the  preceding  observation  suggests,  we  should  consider  two  configurations  (object  or  image)  equivalent  if 
they  differ  by  an  affine  transformation.  In  a  sense  equivalent  configurations  are  the  same  object  or  image  just 
rotated,  translated,  scaled,  or  otherwise  moved  by  an  affine  transformation.  We  would  like  to  construct  the  space 
of  configurations  of  r  points  in  En  modulo  the  action  of  the  group  of  affine  transfer  mat  ions.  These  spaces  would 
then  represent  the  distinct  objects  and  images  independent  of  pose  or  view.  To  do  this  we  must  assume  that 
the  points  in  our  configuration  are  non-coplanar  for  n  —  3  or  non-col  linear  for  n  =  2,  which  is  reasonable  since  a 
configuration  of  coplanar  points  in  R3  would  in  fact  be  a  configuration  of  points  in  R2  and  would  not  represent 
a  real  3D  object,  etc. 

Let  Pi  =  (Xf.i, .  - .  ,£f<n)  for  i  ==  1  .,,r,  r  >  n  H-  2  be  a  configuration  of  r  non-coplanar  (or  non -col l inear) 
points  in  R'\  n  =  3  (or  2),  and  consider  the  matrix 


x%ti 

*2,1 

3V,I 

3:1,2 

*2,2 

*  ‘  *  3-r,2 

*n,2 

3Vtn 

1 

1 

1 

Now  to  the  configuration  PXt .  ♦  *  t  Pr  we  associate  an  (r  -  v  -  1  )-dimensional  linear  subspace,  c  Rr, 

In  particular,  Kr~n~l  is  the  null  space  of  M  when  we  view  M  as  a  linear  map  from  Rr  to  RFA+1.  The  fact  that 
j^r-n-i  jias  dimension  r  —  n  —  I  follows  from  the  observation  that  A/  lias  rank  n  +  1  as  a  linear  map  because  at 
least  one  (ti  +  l)  x  (n  +  1)  minor  of  M  has  non-zero  determinant  due  to  non-coplanarity  (or  non-coil inearity). 


r,n 

1  / 


but  the  null  space  of  M*  is  exactly  Kr~n~l}  the  null  space  of  Af.  Moreover,  since  X’"'*-1  C  //r_1  = 
{(ulT . . .  f  ur)  6  Rr|  2*=i  vi  —  0}i  we  may  assign  to  our  configuration  the  unique  point  [Xr_n_1]  e  G(r  —  n  — 
1,  Jfr_1),  the  Grassmannian  of  {r  —  u  —  l)-dimens tonal  subspaces  in  the  (r  —  I) -dimensional  space  Hr~l ,  a  well 
understood  compact  manifold  of  dimension  n(r  —  n  —  1), 

Definition  l.L  li'e  call  the  manifold  X  —  G(r  —  n—  1,/f1"-1)  the  affine  shape  space  for  configurations  of  r 
points  in  If  n  —  3,  we  wilt  call  X  =  G{r  -  4,r  -  I)  affine  object  space  (or  just  object  space )  and  refer  to 
points  in  this  space  as  object  shapes.  If  n  —  2,  we  will  call  X  “  C7(r  —  3,r  -  1)  affine  image  space  (or  just  image 
space)  and  reefer  to  its  elements  as  image  shapes. 

Every  point  in  X  is  of  the  form  [Xr-n-1]  for  some  configuration  Pil . . . ,  Pr  G  R",  and  most  importantly,  if 

two  configurations  Pu  >. . ,  Pr  €  Rn  and  . ,P(  €  Rn  give  the  same  point  in  X,  then  they  differ  by  an  affine 

transformation. 

1.3,  The  Pliicker  Embedding 

Since  X  is  a  real  manifold,  we  can  find  local  coordinates  for  a  point  [X)  e  X;  however,  since  we  ultimately 
want  to  give  relations  that  tell  us  when  an  image  configuration  is  a  projection  of  an  object  configuration,  it 
would  be  more  convenient  to  find  global  coordinates  on  X.  We  may  do  so  by  mapping  X  into  a  projective  space 
via  the  Pliicker  embedding.  In  general,  the  Pliicker  embedding  embeds  a  Grassmannian  G(nt  r)  (n-dimensional 
subspaces  of  an  r-dimensional  vector  space,  VT)  in  the  projective  space  IP  {  f\  n  Vrj  ^  p{— «)-1  a*  p{«)-i  ^ 

projective  variety  in  the  following  way:  let  [X]  €  G(rc,r).  Then  X  is  the  intersection  of  r  -  n  hyperplanes  in  our 
vector  space  VT *  where  each  hyperplane  is  given  by  a  linear  form 


r 


where  C]  t . .  * . ,  er  is  a  basis  for  Vr  and  are  the  dual  basis.  More  simply  put,  X  is  the  null  space  of  the  matrix 


f  klj  ^1,2  kitT 

&2J  &2,2  *•*  &2fr 


Now  for  each  I  <  i\  <  h  <  <  r  we  define  [ij,  jj,  -  -  - ,  tr-n]  to  be  the  determinant  of  the 

(r  -  n)  x  (r  —  n)  minor  of  L  whose  columns  are  the  ii,  12, . . . ,  ir-n  columns  of  Ly  t.e. 


The  Pliicker  embedding  is  now  defined  to  he  the  map 


*n,r:  G(n,r)  —  P^"1 


[X]  1 — *  ([1*2, , , .  ,r  -  n\  :  . . ,  :  [n  -f  I,  n  4-  2, . . . ,  r])  (all  minors) 


and  the  homogeneous  coordinates  of  4>n>r  {[A"])  are  called  the  Plucker  coordinates  of  K. 

It  is  important  to  note  that  this  map  does  not  depend  on  our  choice  of  hyper  planes,  but  does  depend  on 
our  choice  of  basis  for  Vr  .  We  should  also  note  that  this  map  does  in  fact  embed  G(n,  r)  as  a  closed  projective 

variety  in  pfn)-1.  In  other  words,  (G(n,r))  is  the  zero  locus  of  some  system  of  homogeneous  polynomials 

/i, . . . f in  the  variables  xii2 . r_n;  *  ■ .  ;xn+li_pr  with  coefficients  in  the  base  field  of  Vr.  We  use  the  variables 

x\X_,. ,r-m  -  * .  ;xn+t,„.tr  to  indicate  that  the  coordinate  of  4>nir  ([/vj)  is  [ijt. . .  ,ir_n|.  The  equations 

fi  =  0,  I  <  i  <  s  are  known  as  the  Plucker  relations  (see  [3]  or  [■!]. 

One  way  to  give  global  coordinates  on  X  =  G(r  —  n  —  I,  //r_1),  would  be  to  embed  X  into  the  projective 

space  ]Pg  n  '  via  the  Plucker  embedding  However,  this  would  require  us  to  choose  a  basis  for 

Hr~K  Fortunately,  there  is  a  very  natural  way  to  avoid  this  problem. 

Since  Kk~n~l  c  Hr~l  C  we  may  view  X  as  a  submanifold  of  G(r  -  n  -  l,r),  in  which  cast*  4»r-n- t.r 

/  r  i 

embeds  X  in  Pr as  a  subvariety  of  4v~n-i ,r(G{r  -  n  -  1 5 r}) .  Under  this  map,  a  configuration  Pi  = 

(x^i, . , ^Xiin),  i  =  1,..  8,r  is  mapped  into  by  taking  all  the  determinants  of  the  maximal  minors  of 

our  original  feature  point  matrix 
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xrit 
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^1.2 
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xn2 

I 

1 

1 

/ 

Embedding  our  shape  space  X  into  u  in  this  fashion  is  in  some  sense  a  more  natural  way  to  give 

global  coordinates  on  X  than  embedding  it  into  Pr  .  This  method  allows  us  to  work  directly  with  the 
matrix  determined  by  our  configuration  rather  than  forcing  us  to  choose  a  basis  for  Hr~l  and  then  rewriting 
our  basis  for  in  terms  of  our  chosen  basis  for  H.  Also,  as  we  will  see  later  in  this  paper,  this  method  is 

more  closely  related  to  the  one  that  we  will  use  in  the  full  perspective  case. 

DEFINITION  L2.  Gwen  a  configuration  P\y  ...  t  Pr  €  K75  we  will  refer  to  the  Plucker  coordinates  of  Kr~n~l 
viewed  as  a  sub  space  of  Mr  (rather  than  Ifk  “ 1 )  as  the  shape  coordinates  of  the  configuration  P\m .. . ,  /  V 

1*4.  The  Object /linage  Relations 

Given  an  object  configuration  P\,..riPr  and  an  image  configuration  Q\,  .  ,Qr  we  want  to  give  necessary 
and  sufficient  conditions  (the  object-image  relations)  for  the  Qx  to  be  a  generalized  weak  perspective  projection 
of  the  Pi.  Recall  that  we  view  our  object  space  as  a  subvariety  of  p(^-1  and  our  image  space  Y  as  a  subvariety 
of  jp(ti) “ 1 .  As  such,  we  want  to  view  the  set  V  of  pairs  (A\  L)  where  L  is  an  image  shape  that  comes  from  a 
generalized  weak  perspective  projection  of  the  object  shape  A'  (the  so-called  set  of  matching  object-image  pairs) 

as  a  subvariety  V  C  X  x  Y  C  x  Ipta)-1*  Therefore,  our  object-image  relations  should  be  a  system  of 

bihomogeneous  polynomials  in  the  object  and  image  shape  coordinates  whose  zero  locus  is  precisely  V 

Recall  that  our  object  shapes  are  linear  subs  paces  A'r“4  C  Rr  of  dimension  r  —  4  and  our  image  shapes  are 
linear  subspaces  Lr_3  C  Rn  of  dimension  r  —  3.  The  following  relates  object  and  image  shapes  under  generalized 
weak  perspective  projection. 

THEOREM  1.3.  Let  Pt , .  * . ,  Pr  be  an  object  configuration  with  corresponding  object  shape  Kr™4  and  lei  ,  Qr 

be  an  image  configuration  with  corresponding  image-  shape  Then  the  Qt  are  a  generalized  weak  perspective 

projection  of  the  Pt  if  and  only  if 

Kr~A  c  Ar“3  c/r!cRf 

This  fact  and  the  incidence  relations  given  in  Theorem  I,  §5,  Chapter  VII  of  Hodge  and  Pedoe  [«l|  give  us  our 
object-image  relations. 


THEOREM  L4,  Let  P*  =  1  <  i  <  r  be  an  object  configuration  with  corresponding  matrix 
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and  let  Q%  =  (uifVi),  l  <  i  <  r  be  an  image  configuration  with  corresponding  matrix 

(u}  u2  ur  \ 

vi  V2  vr 

11  1  } 


For  1  <  i\  <  h  <  h  <  U  <  r  and  1  <  ji  <  j%  <  jz  <  r  define  the  object  shape  coordinates 


mii  ,<2,13,14  —  det 
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and  the  image  shape  coordinates 


Then  the  points  Qi, . .  *tQr  nre  the  images  of  Pit . . ,  ,  Pr  under  a  generalized  weak  perspective  projection  if  and 
only  if 

l<Ai  <Aj  <r 

for  all  choices  of  1  <  a\  <  a2  <  r  and  1  <  ft  <  ft  <  . . .  <  ft-s  <  r  where  1  <  71  <  72  <  73  <  r  is  the 
complement  of  { Ai,  Aa,ft, when  Ai,A2,ft, — sft_ 5  arc  distinct  (oth erwise  nTli7373  =0J 
and  £A!tA2  *s  s*Qn  °f  & e  permutation 


7li  72»73i^li^2ifti  -  ■  -  ift-5 

0/  (Ac  numbers  I T , , ,  ,r.  TAe  expressions  m^^A^Aa  and  ^  73  sAotdd  6e  treated  as  sAew- symmetric  in  their 
indices. 

As  an  example,  consider  the  case  r  —  5.  We  pick  a\  =  l,  02  =  2  and  no  fts  are  required.  Our  formula 
becomes 

l<Ai<A2<5 

when  717273  is  the  complement  of  Aj,  A2  in  {1, ...  ,  5}.  This  yields 

™I  234^125  —  7721235^124  +  771124577*23  =  0. 

We  get  10  such  equations  as  we  vary  e*i  and  <*2. 

2.  THE  FULL  PERSPECTIVE  CASE 

We  now  turn  our  attention  from  generalized  weak  perspective  projections  to  the  so-called  pinhole  camera 
model,  which  is  simply  projection  from  a  point  P  in  projective  space  P3  onto  a  hyperplane  //  not  containing  P: 

tt  ;  r*  -  {P}  — *  //^  P2 


This  case  becomes  much  more  difficult  since  we  are  now  considering  configurations  of  points  in  projective  space 
and  hence  are  allowed  to  scale  each  of  our  points  (homogeneous  coordinates)  by  an  arbitrary  nonzero  constant. 


We  will  consider  r  points  in  projective  3-space,  which  we  will  again  think  of  as  feature  points  on  an  object, 
(There  is  a  hyperplane  that  does  not  pass  through  any  of  these  points  and  the  complement  of  that  hyperplane 
in  IP3  is  isomorphic  to  R3.)  We  will  refer  to  this  set  of  points  as  a  projective  object  configuration  or  simply  an 
object  configuration  when  it  is  clear  that  we  are  dealing  with  points  in  projective  space.  Now  “taking  a  picture" 
of  the  object  is  just  projecting  the  object  configuration  from  a  point  onto  a  hyperplane  (which  is  isomorphic  to 
P2}.  We  refer  to  this  type  of  projection  as  a  full  perspective  projection ,  and  we  call  the  image  of  a  projective 
object  configuration  under  such  a  projection  a  projective  image  configuration  or  simply  an  image  configuration. 

When  we  view  projection  from  a  point  as  a  map  tt  ;  P3  — >  P2,  our  projections  take  the  form 


where  T  is  a  3  x  4  matrix  of  rank  3  and  equality  is  in  the  sense  of  homogeneous  coordinates.  Conversely,  every 
3x4  matrix  T  of  rank  3  defines  a  projection  from  some  point  in  this  way.  More  precisely,  this  point  is  given 
by  the  1 -dimensional  null  space  of  T  (remember  that  points  in  projective  n- space  are  1-dirnensioiiai  subspaces 
of  affine  n  +  1 -space) . 

We  should  note  that  if  Q  =  (R  :  S  :  T)  £  P2  is  the  image  of  P  =  (X  :  Y  :  Z  :  W)  €  P3  under  a  full 
perspective  projection  T  (so  Q  =  TP)  then  for  any  3x3  scalar  matrix  A  and  any  4x4  scalar  matrix  B  we  have 
Q  —  (ATB)P.  Thus,  the  set  of  full  perspective  projections  is  equivalent  to  the  set  of  3  x  4  matrices  of  rank  3 
up  to  multiplication  on  the  left  or  right  by  a  scalar  matrix. 

Now  let  T  be  a  full  perspective  projection.  Let  A  be  any  3x3  matrix  with  det(d)  /  0  and  let  B  be  any 
4x4  matrix  with  det(J?)  ^  0,  Then  AT B  is  again  a  3  x  4  matrix  of  rank  3,  i.e.  ATB  is  again  a  full  perspective 
projection.  Note  that,  as  previously  observed,  if  we  multiply  A  and  B  by  scalar  matrices,  the  projection  ATB 
remains  unchanged  as  a  map  between  projective  spaces.  Thus,  we  should  view  A  as  an  element  of  PC L( 3)  and 
B  as  an  element  of  PGL{4),  (In  general,  PGL(k)  is  the  quotient  GL(k)/S  where  S  is  the  subgroup  of  scalar 
matrices.) 

The  impact  here  is  that  the  best  we  can  hope  to  do  is  to  relate  object  configurations  up  to  a  PGL{ 4) 
transformation  with  image  configurations  up  to  a  PGL(3)  transformation.  Hence,  our  object  shape  space  should 
be  UJPGL(4)  for  some  open  set  U  C  (P3)r  and  our  image  shape  space  will  be  W/PGL(3)for  some  open  set 
W  C  (P2)r,  when  we  have  r  point  features, 

2.1.  The  Associated  Variety  of  a  Configuration 

In  the  affine  case,  we  were  able  to  assign  to  each  shape  a  distinct  point  in  a  fixed  projective  space.  Un¬ 
fortunately  in  the  full  perspective  case,  our  ability  to  scale  the  homogeneous  coordinates  of  the  points  of  our 
configurations  complicates  matters,  so  that  no  convenient  analogue  of  the  affine  shape  coordinates  are  available. 
We  circumvent  this  problem  by  instead  assigning  to  each  configuration  a  natural  projective  variety.  Later  in  this 
paper,  we  will  discuss  the  possibilities  made  available  by  using  Chow  forms  to  give  global  coordinates  on  our 
projective  shape  spaces. 

Although  ultimately  we  want  to  consider  configurations  of  r  points  in  P2  and  P3,  let  us  begin  bv  examining 
configurations  of  4  points  in  P1 .  Let  Pt  —  (x,  ;  y,)  €  P1  for  1  <  i  <  4,  We  will  assume  that  the  points  are 
not  all  the  same  point.  In  the  spirit  of  the  affine  case,  we  make  this  configuration  with  these  representative 
homogeneous  coordinates  into  a  matrix 


/  X!  x-2  xz  x4  \ 

\  Vi  yi  V 3  S/4  / 


As  in  the  affine  ease,  this  matrix  corresponds  to  the  point 


(mn  mi 3  :  7 nj4  :  nm  :  m2-i  : 


7ri34 )  E  G(2t4)  C 


—  P  '  where  =  (let 


1  <  i <  j  <  L 


noting  that  since  the  points  are  not  identical  in  P1 ,  at  least  one  of  the  is  nonzero. 

If  for  each  1  <  i  <  4  we  scale  P{  by  a  nonzero  constant  a*,  we  have  the  same  configuration,  but  our  matrix  is 
now 
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which  corresponds  to  the  point 


{aiazmn  :  ai  4317113  :  aia4m14  :  a2a3m23  :  a2Q4ro24  :  a3a4m34)  €  G(2,4)  C  P5 


Thus  for  a  given  configuration  of  4  points  in  P1  we  have  a  map  4> ;  (R*)4  ^  G( 2,  4)  given  by 


^(ai,  02,^3,04)  =  (aia2fni2  :  ax a^mis  :  U]a4nii4  :  020.311123  ■  0.20411124  ■  a3a4m34) 

(here  R*  is  the  multiplicative  group  of  nonzero  elements  of  R).  Notice  however  that 

<£(a,a,  a,  a)  —  a2(mt2  :  nii3  :  mu  :  m23  ;  77124  -  ^34)  “  :  mia  :  ml4  :  m23  ■  m24  :  m34)  in  P5. 

So  we  have  in  fact  a  well  defined  map  4*  ;  (Rm)4/Rm  ^  (R*)3  G{ 2,  4}  whose  image  we  will  denote  V(Pj  1  p3,  P3l  p4) 
C  G(2, 4)  C  P5  (or  simply  V  when  the  configuration  we  are  working  with  is  understood).  Thus,  to  each  configu¬ 
ration  we  may  assign  a  variety  V(PjT  P2 1  ^3,  P4),  the  closure  of  V  in  P5,  which  we  will  call  the  associated  variety 
of  the  configuration. 

Proposition  2.1.  Every  configuration  Pu  P2j  P3l  P4  w  assigned  a  unique  variety  V(PuP2r  P3,  P4),  and  if  two 
configurations  Pi1P2,P3,P4  and  P^P^F^  have  the  same  associated  variety,  then  they  differ  by  a  PGX(2) 
transformation  (and  hence  give  the  same  point  in  our  shape  space). 

Proof.  The  fact  that  every  configuration  is  assigned  a  unique  variety  is  obvious.  So  suppose  that  for  two  con¬ 
figurations  Pi  =  (a*  :  ifi),  1  <  i  <  4  and  P'  =  (xj :  yj),  I  <  i  <  4  we  have  V(PU  P2,  P3,  P4)  =  V(P{,  F%r  Pj,  />'). 
Then  for  some  aiTa2?U3,a4  €  R* 


(mX2  :  m13  ;  mI4  ;  m23  -  m24  :  m34)  =  (o,a2m'12  ;  aia3m'13  :  0104 mrl4  :  a2a3m23  :  a2a4m^4  :  a3a4m34) 

where  =  del  (  1  J  )  and  mj  —  det  X)  Xj  V  So  we  have  that  the  null  spaces  of  the  matrices 

\  Vi  Vj  J  \  V%  2/j  / 
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give  the  same  point  under  the  Pliicker  emhedding  and  lienee  are  in  fact  the  same  linear  subspace  of  R1,  Thus 
the  matrices  differ  by  the  left  action  of  a  GL( 2)  matrix  from  which  we  see  that  the  configurations  PUP2,  P3>  P4 
and  P[,  Pj,  P^,  P'A  differ  by  a  PGL{ 2)  transformation.  □ 

Now,  having  placed  our  configurations  Fi,  F2,  P3,  P4  (up  to  a  PGL( 2)  transformation)  in  one-to-one  cor¬ 
respondence  with  the  projective  varieties  V(Fi,F2,  P3,  A),  we  would  like  to  understand  the  relations  that  the 
points  in  V  must  satisfy.  So  let  Pi,P2,P3,P4  <E  P1  and  let  (x,2  :  xi3  :  xl4  :  xa3  :  x24  :  xM)  be  a  point  in 
V  —  V(P|,  P2,  P-s,  P4)-  Then  for  some  n i , a2, a3, 04  e  R*  the  following  must  hold 


xX2  -  ota2m|2  =  0 

X|3  —  aia2T7ii3  —  0 
X[4  —  a^a477i|4  =  0 
x23  —  a2a37n23  —  0 
x24  —  a2a4m24  =  0 
x34  —  a3a4m34  ~  0 


Using  Groebner  bases,  we  eliminate  the  a/s  from  this  system  and  obtain  the  following  Theorem 
THEOREM  2.2.  V  is  the  zero  locus  of  three  polynomials  in  the  variables  in,  ■  *  -  ,£34 

/l  =  mi2Tn34^i3X24  —  fTl\ 377124 £ 1 2^34 

f'2  —  fn  12*7134  £  1 4  £23  —  m  14X^233:123:34 
fz  =  7121377124X14X23  —  17114  77123  £13^24 


These  same  relations  can  also  be  obtained  by  observing  that  if  i  i ,  *3*  *4  and  ji ,  J2 ,  h  1 are  two  permutations 

of  1 ,2,3,4  then 

mt  i  m*3U  xjUarci^J4  _  m*i  v>  (aJi  ah  mJU3  )  (aJa  Qj*  mja  ^ 

mii  Ja  mJ3J<  I  ^iaU  mil  J2  mj3  j*  (ai|  ai2  mil  ta  )  (a»3  aU  mi3»4  ) 

We  should  note  that  since  (mi2  ;  77*13  :  mu  :  m23  '  77*24  : 17134)  and  (x\2  :  £13  :  X14  :  x23  :  £24  :  £31)  are  points 
in  G(2,4)  C  P\  the  Pliicker  relations 

Pl  =  7711277134  —  7711377124  +  7711477123  —  0 
P2  “  £12^34  ”  £13^24  +  ^14^23  =  0 


are  satisfied.  It  is  easily  seen  from  these  relations  that  as  long  as  enough  of  the  mtj  are  nonzero,  we  have  that 
V(fi )  —  V (/2)  =  Vr(/3)  as  subvarieties  of  G(2,  4)  and  hence  V  is  defined  as  the  zero  locus  of  any  one  of  /i ,  /2,  fo 
In  particular  V  is  a  hypersurface  in  G(2,  4)  and  so  has  dimension  dim(V)  =  dim(G(2,  4))-l— 3, 

AH  of  the  preceding  discussion  can  be  easily  generalized  to  the  case  of  r  >  7*  +  2  points  in  P*\  For  each 
configuration  Pi  =  1  <  i  <  r  of  r  points  in  F\  we  liave  a  map  <i>  :  (M')r/R*  -+  G(n  +  l,r) 

obtained  by  constructing  a  matrix  whose  columns  are  representatives  of  Pi, .  . . ,  Pr  in  Fn  and  then  scaling  tlie 
columns  of  that  matrix.  We  denote  the  image  of  4>  by  V(Pi, . .  *  ,  Pr).  Thus,  we  place  the  configurations  Ft, . . . . ,  Pr 
of  r  points  in  P™  in  one-to-one  correspondence  with  the  projective  varieties  V(Pi,  ,  ♦ ,  ,,  Pr) 

Explicitly,  the  map  4>  :  (R*)r/R*  -*  G(n+  l,r)  is  given  by 

#(ul,  ■  ■  -  ,ar)  -  {ahmh  :  . . .  :  aIf,mj„) 

where  /j,  are  the  (n  4-  1) -subsets  of  {1,  ...,r}  and  ajk  =  n^^/ic  ai-  w°uld  like  to  know  for  which 

configurations  is  $  one-to-one.  In  other  words,  we  would  like  to  know  for  which  configurations  do  we  have 

(atl  mfl  ;  . . .  :  afNrnfN )  =  (mIt  :  :  m/jv }  <=*  a,  =  aj  for  all  i,  j 

The  following  theorem  gives  a  large  set  of  configurations  (hut  not  necessarily  all)  for  which  4*  is  1-1. 

THEOREM  2.3.  Suppose  Pi, . . .  ,Pr  is  a  configuration  of  r  points  in  PTi  so  that  there  is  a  subset  Pt| , . . . ,  Ptn+2  of 
n  +  2  points  in  this  configuration  having  the  fallowing  properties: 


L  for  every  subset  J  —  {ji,  ■  ,in  +  i }  C  {i i  P  .  * .  ,2^+2}  the  points  Pj, , . . . ,  Fjn+1  do  not  lie  in  a  single  hyper¬ 
plane  fi.e.  tjij  ^  0) 

2 .  there  is  some  subset  K  —  {&i  ,..Mfcn}  C  {ii, . . . ,  7*1+2 }  such  that  for  all  Ps  not  m  the  set  {  Pt! , . .  t  Pln  { ^  } 
we  have  that  the  points  Pfil , . . . ,  P^n ,  P$  do  not  all  lie  m  a  single  hyperplane  (i.e.  j ntCx  ^  IV- 

Then ,  the  map  41  is  injective. 

Proof.  We  will  show  that  under  these  conditions, 


{ahm.i,  ai„miN)  =  (mr,  :  :  mfw) 


Oi  =  «j  for  all  i.j. 


Note  that 


if  and  only  for  all  i  ^  j , 
assuming  of  course  that  m /.  ^0. 

First t  let  a,/?  e  {1, . . .  tr}  be  such  that  a}0  are  not  in  the  set  {iu  .  * .  in+2)^  Then  by  condition  2,  if  we  let 
A  —  ki, .  * .  T  kny  a  and  let  B  =  k\ . kn,  0  we  have  that  tyia  ^  0  and  ^  0,  Thus  since 

a  Am  A  _  fUA 
aBmB  mB 

we  have  that 

2a -I. 

and  hence  a0  =  ap. 

Now,  let  be  such  that  a  is  in  {ii, .  -  - ,  t'n+2}  but  /?  is  not.  Choose  in  {iA, _ ,in+*a}  so  that 

js  ¥*  jt  if  &  +  t  and  so  that  ja  ^  a  for  all  s.  Let  A  —  {ju^  Jn^}  and  let  B  =  {fc|, . . . ,  fcnv-0}.  Then  by 
condition  l  ,  tha  #  0  and  by  condition  2,  mB  ^  0.  Thus  as  above,  we  again  get  that  aa  =  ap. 

A  similar  argument  shows Jthat  if  a  and  0  are  both  in  {ij,  * .  *  ,1'n+a}  then  a*  =  ap.  Thus,  under  conditions  1 
and  2,  we  have  that  the  map  is  l-L  D 

We  see  now  that  for  configurations  Pi , . . . T  Pr  satisfying  conditions  (1)  and  (2),  V{P1t.  . PT)  is  isomorphic 
to  {lR*)r/K*  ^  Rr_l.  In  particular,  dim  ^VfPj, . . . ,  Pr)^  =  r  —  1  which  is  consistent  with  our  result  in  the  case 
of  4  points  in  P* . 

We  do  have  a  slight  variation  from  the  case  of  4  points  in  Pl  when  we  compute  the  defining  equations  of 
V(Plt  — ,  Pr).  Consider  the  case  of  5  points  Pj,  * . . ,  P5  in  Pl .  Then  we  observe  that  for  a  point  {x\2  '  ■  -  *  -  X45) 
in  V(P\ , . , .  ,  P5)  we  have  for  some  ai , .  * . ,  a5  €  M* 

niytfTll  3*7145^  3^25  _  m  I2fflj  3TH45  (a  1 04X14)  (fl  l  Q3X  j 3 )  (<*2^5^25  )  _  ^ 

m  u  mi3ni25^  J  2X13X45  ^  14  3^25  (a  1  &2X  ]  2  )  («1 03X13  )  (04  05X45  ) 

giving  us  the  relation 

^12^13^45X14X13X25  “mi4mi3m25Xi2Xi3X45  —  0. 

So  in  general  we  will  have  some  repetition  of  the  entries  of  the  indices  even  though  in  the  case  of  4  points  in  P1 
we  did  not. 

THEOREM  2.4.  For  a  configuration  Pi , . . . ,  Pr  of  r  points  in  Fl,  the  variety  V{Pi  , . . . ,  Pr)  is  the  zero  locus  of 
the  following  system  of  polynomials 


mhmh  -  ■  ■  -  *  *  xjk  -  mjxmj a  *  ■  mjkXixxh  -  -  -  xIk 


where  Jit, ..  ,  /*,  J\,  -  -  - ,  Jk  ranges  over  all  n  4-  l -subsets  of  {1, _ r}  with  the  property  that  (jjL1  I%  —  jjf— 1  M 

as  multisets  arid  k  ranges  from  2  to  some  positive  integer  N(r).  The  exact  value  of  N{r )  is  not  known,  hut 
computation  of  some  small  examples  seems  to  indicate  that  N(r)  =  r  —  2. 


3,  THE  PROJECTIVE  OBJECT-IMAGE  RELATIONS 

Given  a  projective  object  configuration  Pi,  — ,  Pr  and  a  projective  image  configuration  Q lt , . . ,  Qr ,  we  want  to 
(as  in  the  affine  case)  find  necessary  and  sufficient  conditions  for  the  Qx  to  be  a  full  perspective  projection  of  the  P|. 
Since  every  object  configuration  (fixing  its  homogeneous  coordinates)  gives  a  point  in  G(4,r)  c  p(^_1  and  every 

image  configuration  (fixing  its  homogeneous  coordinates)  gives  a  point  in  G(3,  r)  c  the  closure  of  the  set 

of  matching  object-image  pairs  should  be  a  projective  variety  defined  by  a  system  of  bi homogeneous  polynomials 
in  the  Pliicker  coordinates  7711234,  ■  -  - ,  mr_3...r  on  G{4,  r)  and  the  Pliicker  coordinates  71123,  ■*  *  1  nr-2  ,.r  on  G(3,  r). 
These  relations  should  be  satisfied  independent  of  our  choice  of  representatives  (homogeneous  coordinates)  for 
our  object  and  image  configurations.  In  other  words,  we  should  have  that  if  an  image  configuration  , ,  Qr 

is  a  full  perspective  projection  of  an  object  configuration  Pi,  *  * . ,  Pr  then  the  product  variety 
V(Pi, . ,  Pr)  x  V(Qj , .  ■  ■  1  Qr)  should  be  completely  contained  in  V. 

Now,  consider  an  object  configuration  Pi,,.«,Pr  with  Pi, P2, Pa> Pi, Ps  in  general  position.  We  may  then 
move  the  configuration  by  a  projective  transformation  so  that  P\  =  {i  :  0  :  0  :  0),  P2  =  (0  :  1  :  0  :  0),Fa  =  {0  : 
0:1:  0),  P4  =  (0  :  0  :  0  :  1)  and  P5  =  (1  :  1  :  1  :  I).  Assume  also  that  for  all  1  >  6,  P,  does  not  lie  in  the  plane 
defined  by  Pi,  P2,  P3  so  that  f*  =  (p3i-i7  ■  Pai-is  :  p:u- 15  :  1)-  It  turns  out  that  we  can  write  plt , , .  ,pai.  ,  in 
terms  of  Pliicker  coordinates  in  the  following  way 

***234t***l235  *H  1341***  1  235  T71 1241*71 1235 

p3i-]7  ~ - — " — %  P3i-1«  =  - - »  P3i-15  =  - 

17112^1712345  ***1231*711349  1245 

Note  that  the  pj  are  defined  independent  of  our  choice  of  representatives  of  Pi, . . . ,  PT  for  if  we  scale  each  P,  by 
a  nonzero  constant  a* ,  we  get 


r  _  (Q2Q3Q4Qp»234t)(Q  1^2^3^5**11235) 
(  a  i  OjU,  HZ  1 23i )  (^2  ^3  ^405*712345  ) 

_  (a  i  0304^1  nti34t)(a  1^203^5**11235) 

(a  1  020,30*771123* )  (a  \  030405771.  1 345 ) 
_  (aio2a 4 a t Til  1 24 1 )  (o  1  a 2 a<is 7*  1 1 2 35 ) 
(a  1  <12*13  a*  m  1231 )  (a  1  02*1405**1 1245 ) 


***2341***1235 
7*1  I23i  **12345 
***134 1  ***1235 
mi  23,  mi  345 
7^1241***1235 
mi  23****1245 


The  values  pi, . . .  J*3r-i5  form  a  fundamental  set  of  invariants  for  our  object  configuration. 

Similarly  let  Q\ . Qr  be  an  image  configuration  with  Q\ ,  Q2>  Q3,  Qa  in  general  position  and  such  that  for 

i  >  5t  Qt  is  not  on  the  line  defined  by  Q\  and  Q2  We  move  the  configuration  by  a  projective  transformation  so 
that  Qx  =  (1  :  0  :  0),Q2  -  (0  :  1  :  0), Q3  «  (0  :  0  :  1),Q4  =  (1:1:1)  and  for  each  i  >  5,  Qt  =  fai-*  :  :  1). 

The  projective  invariants  q\ . qin-B  are  again  defined  independent  of  our  choice  of  representatives  and  are 

given  in  Pliicker  coordinates  as 

*l23i*ll24  **131*1124 

—  - - — <?2i-S  —  - ■ 

Ml  2, « 234  *ll2i*ll34 


When  we  make  the  preceding  assumptions  about  the  positioning  of  our  configurations,  the  object-image 
equations  have  been  completely  determined  (12).  For  example,  in  the  case  where  n  —  {>,  we  have  only  one 
object-image  relation  given  in  terms  of  the  projective  invariants: 


-<72<73p2P3  +q3p2ps  <?3p3  “  QlPlP2  +  tf\P\  ~  +  <?4PiP3  “  *74  P3  <?2<73P2  “  <?2PlP2  + 


Making  the  appropriate  substitutions  and  then  clearing  denominators  and  removing  monomial  factors,  we 


have  the  object-image  relation  in  terms  of  the  Pliicker  coordinates  to  be 


^125^136^34^1 1236^  1246^1345 Tn2345  “ 
-ni26tll35^234tni236tni245nii346m2345  + 
+ni25ni34fl236Tni235^ll245^1346^2345  “ 
-hn  126^134^233^1236^  1245^  1345 7^2346  “ 

—^125^  136^234 fra  1235 rn  1246^1345 Tn2346  + 

+nt26^135n234^^1235tni2457nl3467n2346  “ 


7*12377  136*1234™  1236 ^1246^1345^2345 
Til24Tl  13571236771 123677112457711346^2345 

Til  24  71 135  71236  7H 1235  HI  1 246  7H 1 3467712345 

71l  247113671235771 1236771 12457H 13457712346 

71 124  71 136  71235  771 1235  771 1 246771 1 3457712346 

7112671 13471235771 1235771 1245771 13467712346  =  0 


We  should  note  that  since  the  p,  and  are  defined  independent  of  our  choice  of  representatives  for  the  P, 
and  Qi ,  this  relation  will  be  satisfied  independent  of  our  choke  of  representatives. 

Now  let  <r  be  a  permutation  of  i,...,r.  Suppose  that  in  our  object  configuration  Ptt...,Fr  the  points 
^{i)i  /  /  tf<3)i  f^{4)i  P<r{5)  are  in  general  position  and  that  for  all  k  >  6,  Pff(*}  is  not  in  the  span  of  Fff(i  ^  P»  (2). 

Pa{3).  Then  we  may  move  our  configuration  by  a  projective  transformation  so  that  P^i)  —  { 1  :  0  :  0  :  0),  Pff( 2)  = 
(0:1:0  :  0),Ptf(3)  -  (0  :  0  :  1  :  0),  Pff<4)  -(0:0:0:  1  },Pff{5)  =  {1:1:1:  1),  and  for  k  >  6,  P„(Jt)  -  (p^_lT  : 

P3fc-I6  :  P3A-J5  '  ^)* 

Similarly,  let  r  be  a  permutation  of  1, .  * . ,  a,  and  suppose  that  in  our  image  configuration  Q\ , . . , ,  Qr  the  points 
Qr(i),  QT{2)*Qr(2}yQr{4)  are  in  general  position  and  that  for  all  k  >  6,  QT(k)  is  not  in  the  span  of  Qt(d  and  QT{2)' 
We  now  move  Q 1, ..  .yQr  by  a  projective  transformation  so  that  Qt{  1)  =  (1:0:  0),  Qt[2)  —  (0  :  1  :  0)fQr(3j  = 
(0:0:  !},<2r(4)  —  (1:1:1)  and  for  k  >  5,  Qr{k )  =  W2k-9  :  :  *)■ 

We  now  have  a  new  set  of  object  invariants  p^ , . .  * ,  Par-15  and  a  new  set  of  image  invariants  q[  t . . . ,  gjr_g 
which,  as  before,  may  be  written  in  terms  of  Pliicker  coordinates 


P3i-17  _ 


77^(r(2)<f  (3)11(4)^(1)777^^  |  )j(2)g(3)j(5) 
m<r  ( 1  )<r  ( 2)  a  ( 3)er  ( t )  ma  (  2)(t(3)(T  (4  )<*  ( 5 ) 


t  _  77l<r(l)<r(3)or(4)ff(4)^ff(l)or(2)ff(3)<r(5) 

P3i-i6  TT  '  '  1  -  “ 

777cr(  1  )«f(2)t?(3)t/(i)  777ff(  |  }cr{3jcr{4  )ir(5) 


P3i-75  — 


77l<r(l)g(2)g(4)tf(i)?Tttf{|)g(2Stf{3)g(5) 
77iff(l)ff(2)cT(3}ff(i)771£rf  I  )cr{2)ff(4)£r(5) 


J  _  nT{2)r[2)T(i)nT(l)r(2)r(4) 

%2i-9  ~  — - - - 

71r(l)rC2)r(0riT(2)r(3)T(4) 


*?2i— B  — 


”rO)r(a)r(i)nr(l)r(2)r(4) 

77rU)r(2)r{t}Tlr{i)r(3)r(4) 


keeping  in  mind  that  we  view  the  and  the  nsiu  as  skew-symmetric  in  their  indices* 

Using  the  method  of  [12]  we  get  a  new  set  of  object-image  relations  in  terms  of  the  new  invariants  which  we 
may  again  write  in  terms  of  Pliicker  coordinates.  We  should  notice  that  since  our  projective  transformations 
are  completely  determined  by  sending  Pff(i)i^(ij»^(3)tflf(4)i^(5)  to  (1  :  0  :  0  :  0),  (0  :  1  :  0  :  0),  (0  :  0  :  1  : 
0),  (0  :  0  :  0  :  1),(1  :  1  :  l  :  1)  respectively  and  by  sending  Qr(i)^Qr£2)tQT(3)iQT(4>  to  (l  :  0  :  0),  (0  :  1  :  0),  (0 
0  :  l)t(l  :  1  :  1)  respectively,  we  may  assume  that  cr(6)  <  ...  <  cr(r)  and  that  r(5)  <  <  r(r).  Taking  all  of 

these  object-image  relations  as  tr  ranges  over  all  permutations  of  1,  * , . ,  r  with  <?(6)  <  . .  *  <  cr{r)  and  as  r  ranges 
over  all  permutations  of  1  with  r(5)  <  * . .  <  r(r)  gives  us  a  global  system  of  object-image  relations.  This 

system  is  still  grossly  overdetermined  and  more  work  is  being  done  to  reduce  the  number  of  relations  in  this 
system. 


4 .  CONCLUSION 


The  next  step  is  to  give  a  concrete  description  of  the  shape  spaces  in  the  full  perspective  ease.  This  will 
mean  collapsing  the  associated  variety  V(Pi, . , .  *Fr)  to  a  point.  One  way  to  do  this  is  via  the  Chow  form  and 
the  Chow  point  of  V  {see  [8]),  We  would  then  realize  the  shape  space  as  a  quasi- projective  variety  in  some 
projective  space  where  it  will  acquire  a  natural  metric.  This  program  is  the  subject  of  our  current  work.  While 
the  object- image  equations  provide  a  test  for  matching,  the  metrics  provide  an  even  more  robust  approach  to 
matching.  For  example,  we  often  want  to  know  if  two  configurations  of  a  fixed  number  of  points  in  2D  or  3D 
are  the  same  if  we  allow  projective  transformations.  If  they  are,  then  we  want  a  distance  of  zero,  and  if  not,  we 
want  a  distance  that  expresses  their  dissimilarity  -  always  recognizing  that  we  can  transform  the  points.  The 
Procrustes  metric,  described  in  the  shape  theory  literature  [6]  and  [7] h  provides  such  a  notion  of  distance  for 
similarity  transformations.  However,  it  does  not  work  for  perspective  transformations.  Moreover,  it  is  fixed  in  a 
particular  dimension.  By  that  we  mean  that  it  cannot  be  regarded  as  giving  us  a  notion  of  “distance1’  between, 
say,  a  3D  configuration  of  points  and  a  2D  configuration  of  points,  where  zero  distance  corresponds  to  the  21) 
points  being  a  full  perspective  projection  of  the  3D  points.  However,  the  metrics  we  developed  in  the  affine 
case  can  be  used  to  give  a  natural  measure  of  object- image  matching.  These  metrics  also  provide  a  rigorous 
foundation  for  error  and  statistical  analysis  in  the  object  recognition  problem.  Similar  metrics  can  be  derived  in 
the  full  perspective  case  using  the  approach  mentioned.  The  details  will  be  in  our  forthcoming  papers. 
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Abstract 

In  this  paper  we  survey  some  of  the  mathematical  techniques  that  have  led  to  useful  new 
results  in  shape  analysis  and  their  application  to  a  variety  of  object  recognition  tasks.  In  partic¬ 
ular,  we  will  show  how  these  techniques  allow  one  to  solve  a  number  of  fundamental  problems 
related  to  object  recognition  for  configurations  of  point  features  under  a  generalized  weak  per¬ 
spective  model  of  image  formation .  Our  approach  makes  use  of  progress  in  shape  theory  and 
includes  the  development  of  object-image  equations  for  shape  matching  and  the  exploitation 
of  shape  space  met  rices  (especially  objectdmage  metrics)  to  measure  matching  up  to  certain 
transformations.  This  theory  is  built  on  advanced  mathematical  techniques  from  algebraic  and 
differential  geometry  which  are  used  to  construct  generalized  shape  spaces  for  various  projection 
and  sensor  models.  That  construction  in  turn  is  used  to  find  natural  metrics  that  express  the 
distance  (geometric  difference)  between  two  configurations  of  object  features,  two  configurations 
of  image  features,  or  an  object  and  an  image  pair.  Such  metrics  are  believed  to  produce  the 
most  robust  tests  for  object  identification;  at  least  as  far  as  the  object’s  geometry  is  concerned* 
Moreover ,  these  metrics  provide  a  basis  for  efficient  hashing  schemes  to  do  identification  quickly, 
and  they  provide  a  rigorous  foundation  for  error  and  statistical  analysis  in  any  recognition  sys¬ 
tem*  The  most  important  feature  of  a  shape  theoretic  approach  is  that  all  of  the  matching  tests 
and  metrics  are  independent  of  the  choice  of  coordinates  used  to  express  the  feature  locations 
on  the  object  or  in  the  image.  In  addition,  the  approach  is  independent  of  the  camera/sensor 
position  and  any  camera/sensor  parameters*  Finally,  the  method  is  also  independent  of  object 
pose  or  im age  orientation.  This  is  what  makes  the  results  so  powerful. 

Keywords:  shape  analysis,  object  recognition,  shape  space,  generalized  weak  perspective,  affine 
group,  shape  coordinates,  object-image  metric,  Riemannian  metric. 


1  Introduction 

A  solution  to  the  problem  of  single- view  recognition  is  often  a  crucial  first  step  in  many  target 
recognition  and  computer  vision  tasks*  Understanding  how  information  available  in  a  single  image 
of  an  object,  be  it  an  optical  image,  a  SAR  image,  or  a  radar  range  profile,  relates  to  the  target 
object’s  geometry  is  a  key  step  in  building  reliable  identification  algorithms.  For  example,  without 
a  priori  knowledge  of  a  sensor's  viewpoint,  an  object’s  pose,  or  a  sensor's  parameters,  it  is  difficult 
to  efficiently  recognize  a  three-dimensional  arrangement  of  features  (such  as  a  geometric  configu¬ 
ration  of  Lines  and/or  points)  on  an  object  from  a  single  two  dimensional  view.  What  is  needed 
is  an  approach  that  is  invariant  to  changing  viewpoints,  adjustments  in  the  sensor  parameters,  or 
changes  in  the  object’s  pose.  Unfortunately,  existing  methods  all  too  often  rely  on  computationally 
expensive  template  matching  that  is,  strictly  speaking,  neither  view  nor  pose  invariant.  Specifi¬ 
cally,  those  methods  use  comparisons  against  templates  created  for  each  possible  view  and  pose; 
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with  the  infinite  range  of  possibilities  being  approximated  by  some  finite  number  of  discrete  views. 
Fortunately,  recent  mathematical  developments  in  the  theory  of  shape  provide  an  alternative.  To 
carry  out  such  an  invariant,  shape  theoretic  approach  to  target  recognition,  we  need  to  seek  out  and 
exploit  properties  and  relationships  that  are  geometrically  intrinsic  to  the  objects  and/or  images 
being  compared.  Moreover,  to  develop  this  approach  for  different  types  of  sensors,  we  must  take 
into  account  the  fact  that  each  type  requires  a  different  model  of  image  formation  and  therefore  a 
different  form  of  invariance.  Radar  and  Ladar  sensors  require  the  use  of  an  orthographic  or  seated 
orthographic  model,  while  most  optical  sensors  will  use  either  a  weak  perspective,  a  generalized 
weak  perspective,  or  a  full  perspective  model. 

Once  we  understand,  for  various  sensors,  the  contribution  of  a  single  image  toward  the  recog¬ 
nition  or  recovery  of  the  geometry /shape  of  the  object,  it  becomes  easier  to  develop  methods  to 
integrate  the  information  from  multiple  images  taken  by  uncalibrated,  distributed  sensors  of  vary¬ 
ing  types,  or  to  make  use  of  a  series  of  images  taken  by  a  single  sensor  of  a  moving  object.  It  also 
makes  it  easier  to  understand  and  create  flexible  algorithms  adapted  to  situations  where  the  objects 
are  not  rigid  but  more  deformable,  as  is  the  case  with  many  of  the  recognition  problems  related  to 
biometric  or  medical  applicatioas  (e,g.  face  recognition,  detecting  heart  or  tissue  anomalies,  gait 
recognition,  etc.) 

The  requirement  of  view  and  pose  invariance,  as  well  as  the  desirability  of  a  coordinate  in¬ 
dependent  formulation,  leads  us  to  start  with  a  characterization  of  a  configuration  of  object  or 
image  features  by  its  3D,  2D,  or  ID  shape,  a  mathematical  notion  related  to  geometric  invariance. 
The  specific  transformation  group  (Euclidean  group,  similarity  or  conformal  group,  affine  group,  or 
projective  general  linear  group)  to  which  things  should  be  invariant  will  be  a  function  of  the  sensor 
type.  We  then  need  a  fundamental  set  of  equations  that  expresses  the  relationship  between  the 
3D  geometry  (shape)  and  its  “residual”  in  a  2D  (or  ID)  image.  These  are  known  as  object-image 
equations.  They  completely  and  invariant ly  describe  the  mutual  3D/2D  (or  ID)  constraints.  These 
equations  can  be  exploited  in  a  number  of  ways.  For  example,  from  a  given  2D  configuration,  one 
can  determine  a  set  of  non-linear  constraints  on  the  shape  (geometric  invariants)  of  the  3D  configu¬ 
rations  capable  of  producing  that  given  2D  configuration,  and  thus  arrive  at  a  test  for  determining 
the  object  being  viewed.  Conversely,  given  a  3D  geometric  configuration  (features  on  an  object), 
one  can  dervive  a  set  of  equations  that  constrain  the  shape  of  the  images  of  that  object;  helping  to 
determine  if  that  particular  object  appears  in  selected  images. 

The  ultimate  goal  in  all  cases  is  to  improve  ou  and  develop  new  algorithms  for  target  recognition. 
Our  approach  in  this  paper  uses  advanced  mathematical  techniques  from  algebraic  and  differential 
geometry  to  construct  generalized  shape  spaces  for  various  projection  and  sensor  models  and  for¬ 
mulates  the  object-image  equations  in  terms  of  the  global  shape  coordinates  for  these  spaces.  We 
then  use  the  natural  metrics  on  the  shape  spaces  (which  provide  a  measure  of  dissimilarity  between 
two  object  configurations  or  two  image  configurations  up  to  the  allowed  transformations)  to  find 
natural  object- image  metrics  that  express  the  distance  (failure  to  match)  between  an  object- image 
pair.  Zero  value  for  these  metrics  will  mean  matching  up  to  the  relevant  transformations  and/or 
projections.  These  metrics  are  pose  and  view  invariant  and  are  expressed  in  coordinate  free  terms. 
They  produce  the  most  robust  tests  for  target  identification;  at  least  as  far  as  target  geometry 
is  concerned.  Moreover,  such  metrics  provide  the  basis  for  efficient  hashing  schemes  to  do  target 
identification  quickly  and  also  provide  a  rigorous  foundation  for  error  and  statistical  analysis  in  the 
ATR  process. 

Because  of  the  limited  space  we  have,  we  will  content  ourselves  with  introducing  these  ideas  in 
the  generalized  weak  perspective  case,  which  models  an  optical  sensor  where  the  object  is  in  the  far 
field  of  view.  This  case  is  the  most  mathematically  tractable  and  complete.  Details  can  be  found 
in  the  references.  For  now,  we  introduce  the  theory  and  give  some  examples. 


2  The  Generalized  Weak  Perspective  (Affine)  Case 

We  consider  r  points  in  space,  which  we  think  of  as  feature  points  on  some  object.  We  refer  to 
this  set  of  points  as  an  object  configuration.  Next,  we  “take  a  picture”  of  the  object  by  choosing 
a  plane  and  projecting  these  feature  points  into  that  plane.  We  will  call  this  set  of  points  in  the 
plane  an  image  configuration. 

In  this  section  we  will  (1)  identify  the  space  of  shapes,  which  are  configurations  modulo  the 
action  of  a  certain  group  of  transformations  on  R",  n  =  2, 3,  and  give  global  coordinates  on  the  shape 
space,  (2)  give  necessary  and  sufficient  conditions  for  an  image  configuration  to  be  a  projection  of 
an  object  configuration,  and  (3)  define  a  natural  metric  on  the  shape  spaces.  For  additional  details 
see  Arnold,  Stiller,  and  Sturtz  [2], 

2.1  The  Generalized  Weak  Perspective  Projection 

The  type  of  projections  wo  will  consider  are  called  generalized  weak  perspective  projections.  If  we 
represent  points  P  in  R"  (n  =  2  or  3)  in  column  form  P  =  (xi, . . .  ,r„,  l)7’,  then  these  projections 
take  the  form  of  a  linear  map  T  from  R3  to  R3  given  by  a  matrix 
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where  T  has  rank  3. 

Now  let  A  lie  an  invertible  3x3  matrix  and  let  B  be  an  invertible  4x4  matrix.  It  turns  out 
that  if  T  is  a  generalized  weak  perspective  projection,  then  ATB  is  a  generalized  weak  perspective 
projection  if  an  only  if  A  and  B  are  affine  transformations  i.e.  A  and  B  take  the  form 
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where  S  €  GL(n)  and  Ci . c„g  R. 

What  does  this  mean  in  terms  of  our  object  and  image  configurations?  Suppose  Q  e  R2  is  the 
image  of  a  point  P  €  R3  under  a  generalized  weak  perspective  projection  T,  i.e,  Q  =  TP.  Then 
if  we  move  P  by  some  affine  transformation  B  to  another  point  P‘  and  if  we  move  Q  to  another 
point  Q'  by  an  affine  transformation  A,  we  will  have  that  Q'  =  ATB~lPl.  As  a  result,  we  see  that 
Q1  is  the  image  of  P'  under  the  generalized  weak  perspective  projection  ATB -1  (since  A  and  B~i 
are  both  affine  transformations). 

This  observation  shows  us  that  by  choosing  to  consider  generalized  weak  perspective  projections, 
the  best  that  we  can  hope  to  do  is  relate  object  configurations  to  image  configurations  up  to  affine 
transformations. 

2.2  The  Affine  Shape  Spaces 

As  the  preceding  observation  suggests,  we  should  consider  two  configurations  (object  or  image) 
equivalent  if  they  differ  by  an  affine  transformation.  In  a  sense  equivalent  configurations  are  the 
same  object  or  image  just  rotated,  translated,  scaled,  or  otherwise  moved  by  an  affine  transforma¬ 
tion.  Alternatively,  we  can  view  equivalent  configurations  as  being  the  same  object  or  image,  but 


with  their  feature  locations  expressed  in  a  different  coordinate  system.  We  would  like  to  construct 
the  space  of  configurations  of  r  points  in  Rn  modulo  the  action  of  the  group  of  affine  transforma¬ 
tions.  These  spaces  would  then  represent  the  distinct  objects  and  images  independent  of  pose  or 
view.  To  do  this  we  must  assume  that  the  points  in  our  configuration  are  non-coplanar  for  n  =  3 
or  non-coil  inear  for  n  —  2,  which  is  reasonable  since  a  configuration  of  coplanar  points  in  R3  would 
in  fact  be  a  configuration  of  points  in  R2  and  would  not  represent  a  real  3D  object,  etc. 

Let  Pi  —  (2^1, , •  * for  i  =  1  ,  ..r,  r  >  n  +  2  be  a  configuration  of  r  non-coplanar  (or 
non-collinear)  points  in  n  —  3  (or  2),  and  consider  the  matrix 


^  *1,1 

*2,1 

^rpl 

*1,2 

*2,2 

'  *  *  3V, 2 

^l,n 

*n,2 

^■r.n 

\  1  1  1  / 


Now  to  the  configuration  Pi, . , . , PT  we  associate  an  (r  —  n  —  l)-dimensiona]  linear  subspace, 
^r-n-l  c  jn  particular,  /\  r”n’1  is  the  null  space  of  M  when  we  view  M  as  a  linear  map  from 
Rr  to  Rn+1.  The  fact  that  Xr_n_1  has  dimension  r  —  n  —  1  follows  from  the  observation  that  M 
has  rank  n  -f  1  as  a  linear  map  because  at  least  one  (n  +  1)  x  (n  +  1)  minor  of  M  has  non- zero 
determinant  due  to  non -coplanarity  (or  non-collinearity). 

The  important  thing  to  notice  is  that  if  we  apply  an  affine  transformation  A  to  our  configuration 
we  obtain  a  new  (n+  1)  x  r  matrix  Mf  =  AM ,  but  the  null  space  of  Mf  is  exactly  Kr^n~* ,  the  null 
space  of  M.  Moreover,  since  KT~Jl~x  C  Hr~l  =  {(m, . . .  Tur)  £  W\  Vi  =  0},  we  may  assign  to 
our  configuration  the  unique  point  [A"r_Tl_1j  e  G(r  —  n—  1,  Hr~l),  the  Grassmannian  of  (r  —  n  —  1}- 
dimensional  subspaces  in  the  (r  —  l)-dimensional  space  HT~l ,  a  well  understood  compact  manifold 
of  dimension  n{r  —  n  —  1). 

Definition  2.1.  We  call  the  manifold  X  =  G(r  —  n  —  1,  the  affine  shape  space  for  configura¬ 

tions  of  r  points  in  Rrt,  If  n  =  3*  we  wilt  call  A"  =  G(r  —  4,r  —  1)  affine  object  space  (or  just  object 
space)  and  refer  to  points  in  this  space  as  object  shapes.  If  n  =  2,  we  will  call  A"  =  G(r  —  3,r  —  1) 
affine  image  space  (or  just  image  space)  and  refer  to  its  elements  as  image  shapes. 

Every  point  in  X  is  of  the  form  for  some  configuration  Pi,.  ,  MPr  €  KT\  and  most 

importantly,  if  two  configurations  Pi, .  ♦ , ,  PT  £  Rn  and  . ,  Pj.  €  Rn  give  the  same  point  in  A", 
then  they  differ  by  an  affine  transformation, 

2,3  The  Plticker  Embedding 

Since  X  is  a  real  manifold,  we  can  find  local  coordinates  for  a  point  [ft]  €  X ;  however,  since 
we  ultimately  want  to  give  relations  that  tell  us  when  an  image  configuration  is  a  projection  of 
an  object  configuration,  it  would  be  more  convenient  to  find  global  coordinates  on  X.  We  may 
do  so  by  mapping  X  into  a  projective  space  via  the  Plucker  embedding.  In  general,  the  Plucker 
embedding  embeds  a  Grassmannian  G{n,  r)  (n -dimensional  subspaces  of  an  r-diincnsional  vector 

space,  Vr)  in  the  projective  space  P  (f\r~n  Pr)  —  pvr-J”1  ^  p(n)™1  as  a  projective  variety  in  the 
following  way:  let  [K]  £  G(n,r).  Then  K  is  the  intersection  of  r  -  re  hyperplanes  in  our  vector 
space  Vrr,  where  each  hyperplane  is  given  by  a  linear  form 

r 
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where  ei, .. .  ,er  is  a  basis  for  VT  and  e}  are  the  dual  basis.  More  simply  put,  K  is  the  null  space 
of  the  matrix 
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Now  for  each  1  <  *i  <  *2  <  •  ■  ■  <  ir_n  <  r  we  define  [11,12,  •  -  - ,  V-n]  to  be  the  determinant  of 
the  (r  -  n)  x  (r  —  n)  minor  of  L  whose  columns  are  the  21,22.--*,  ir- n  columns  of  L,  i.e. 
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The  Pliicker  embedding  is  now  defined  to  be  the  map 
:  G(n,r)  — ►  pUH 

[A]  1 — *  {[1, 2, . . .  ,r  —  n]  : . . .  :  [n  4-  l,n  +  2, . . . ,  rj)  (all  minors) 


and  the  homogeneous  coordinates  of  4>„,r  {[A'])  are  called  the  Pliicker  coordinates  of  K. 

It  is  important  to  note  that  this  map  does  not  depend  on  our  choice  of  hyperplanes,  hut  does 
depend  on  our  choice  of  basis  for  VT .  We  should  also  note  that  this  map  does  in  fact  embed 
G(n,r)  as  a  closed  projective  variety  in  p(")“*.  In  other  words,  4>n,r  (G(n,r))  is  the  zero  locus  of 

some  system  of  homogeneous  polynomials  in  the  variables  xli2 . r_„; .  ..;xn+i . r  with 

coefficients  in  the  base  field  of  Vr .  We  use  the  variables  Xi,2, ■  • .  ;xn+i,...,r  to  indicate  that  the 
coordinate  of  4>n>r  ([A])  is  [i|, . . .  ,ir_n].  The  equations  =  0.  1  <  i  <  s  are  known  as 
the  Pliicker  relations  (see  [4j  or  [5]). 

One  way  to  give  global  coordinates  onX  -  G(r  -  n  -  1,  would  be  to  embed  X  into  the 

projective  space  Pg  "  via  the  Pliicker  embedding  4V-n-t,r-i-  However,  this  would  require  us 
to  choose  a  basis  for  Hr~l.  Fortunately,  there  is  a  very  natural  way  to  avoid  this  problem. 


Since  Kk  n  1  C  HT  1  c  Kr  we  may  view  A  as  a  submanifold  of  G(r  —  n  -  1,7-),  in  which 
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case  iTr  embeds  X  in  1PR  as  a  subvariety  of  ,r(C(r  -  n  -  1, r)).  Under  this 


by  taking  all  the 


map,  a  configuration  P*  —  ,^n),  i  =  L — ,r  is  mapped  into  n  R 

determinants  of  the  maximal  minors  of  our  original  feature  point  matrix  M. 

(  r  )-1 

Embedding  our  shape  space  X  into  P^1'  in  this  fashion  is  in  some  sense  a  more  natural 

p-n_i 

way  to  give  global  coordinates  on  A'  than  embedding  it  into  P^  "  ;  .This  method  allows  us  to 
work  directly  with  the  matrix  determined  by  our  configuration  rather  than  forcing  us  to  choose  a 
basis  for  Hr~l  and  then  rewriting  our  basis  for  A'r-n-1  in  terms  of  our  chosen  basis  for  H . 


Definition  2.2.  Given  a  configuration  P\, ...  tPr  €  R*  we  will  refer  to  the  Pliicker  coordinates  of 
A'r_n~‘  viewed  as  a  subspace  of  IRr  (rather  than  //r“1)  as  the  shape  coordinates  of  the  configuration 
Pu . Pr. 


2.4  The  Object/Image  Relations 

Given  an  object  configuration  Pi . Pr  and  an  image  configuration  Qi,. . .  ,Qr  we  want  to 

give  necessary  and  sufficient  conditions  (the  object-image  relations)  for  the  Qt  to  lie  a  generalized 


n  * 
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weak  perspective  projection  of  the  P*.  Recall  that  we  view  our  object  space  A"  as  a  subvariety 
of  pW’1  and  our  image  space  V  as  a  subvariety  of  pU)”1.  As  such,  we  want  to  view  the  set 

V  of  pairs  (A",  L)  where  L  is  an  image  shape  that  comes  from  a  generalized  weak  perspective 
projection  of  the  object  shape  A'  {the  so-called  set  of  matching  object-image  pairs)  as  a  subvariety 

V  C  X  x  Y  C  x  P®”1,  Therefore,  our  object- image  relations  should  be  a  system  of 

bihomogeeeous  polynomials  in  the  object  and  image  shape  coordinates  whose  zero  locus  Ls  precisely 
V", 

Recall  that  our  object  shapes  are  linear  subspaces  Ar“4  C  Mr  of  dimension  r  -  4  and  our  image 
shapes  are  linear  subspaces  Lr“3  C  Rfl  of  dimension  r  -  3,  The  following  relates  object  and  image 
shapes  under  generalized  weak  perspective  projection. 

Theorem  2,3*  Let  Pi, . .  * ,  Pr  be  an  object  configumtion  with  corresponding  object  shape  Kr~*  and 
let  Qi , . . . ,  Qr  be  an  image  configuration  with  corresponding  image  shape  Lr_3,  TTten  the  Qi  am  a 
generalized  weak  perspective  projection  of  the  I\  if  and  only  if 

KT~A  C  Lr“3  C  IT-1  C  Rr 


This  fact  and  the  incidence  relations  given  in  Theorem  f>  §5,  Chapter  VII  of  Hodge  and  Pedoe 
[5]  give  us  our  object-image  relations* 

Theorem  2*4*  Let  Pi  =  Zi),  1  <  i  <r  be  an  object  configuration  with  corresponding  matrix 
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and  let  Qi  =  1  <  i  <  r  be  an  image  configuration  with  coiresponding  matrix 
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For  1  <  ii  <  i2  <  13  <  U  <  r  and  1  <  j\  <  J2  <  js  S  r  define  the  object  shape  coordinates 
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and  the  image  shape  coordinates 
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Then  the  points  Q\ , * . . ,  Qr  arc  the  images  of  Pi, * *  * ,  Pr  under  fi  generalized  weak  perspective  pro¬ 
jection  if  and  only  if 

^  ,  ^AjJA2mai5cr2JAilA2tl7i,72^  —  ^ 
l<Ax<A2<r 


* 


for  all  choices  of  1  <  Qi  <  a2  <  r  and  1  <  ft  <  02  <  •  •  ■  <  Pr-s  <  r  where  1  <  71  <  72  <  73  < 
r  is  the  complement  of  { At,  A2)/?i,.  ..  ,A-s}  in  when  Ai, A2, A, •  •  •  ,A-s  are  distinct 

(oi/iertyisfi  i7273  =  0,1  and  eAlixa  is  the  sijn  of  the  permutation  71, 72, 73,  Ai ,  A2,  A , . . . , /3r-s  0/ 
the  numbers  1, . . .  ,r.  The  expressions THolloa,A|1Aa  an^  ^1,71,73  should  be  treated  as  skew- symmetric 
in  their  indices. 


As  an  example,  consider  the  case  r  =  5.  We  pick  qi  =  1 ,  a2  =  2  and  no  0's  are  required.  Our 
formula  becomes 

yi  t^\Mrni2M,\2n-n'n'n 

1<Ai<A3<5 

where  717273  is  the  complement  of  Ai,A2  in  This  yields  mi234ni2s  -  "M235"i24  + 

mi245ni23  —  0.  We  get  10  such  equations  as  we  vary  aj  and  a2: 


0  =  mi234ni24  ~  T"l235"l24  4-  T"l245"l23 
0  =  7"i234"l35  —  "7l235"l34  +  T"l345"l23 
0  =  1711234^145  —  "7l245"l34  +  071345"!  24 
0  =  7"i23S"l45  —  "7l245"l35  +  mi345ni2S 
0  =  m  1 234  "235  -  m  1235 "234  +  7"2345"l23 


0  =  "11234 "245  —  7"l245"234  +  m2345"l24 
0  —  m  1235 "245  —  "7 1245  "235  +  "72345"  125 
0  =  "71234  "345  “  7"l345n234  +  7"2345"l34 
0  —  "7i235"345  ~  7"l345"235  +  "72345"l35 
0  —  "11245 "345  “  "1 1345  "245  +  "1234571145 


3  Metrics 

How  far  apart  are  two  object  shapes  or  two  image  shapes?  Since  the  shape  spaces  are  Grassman- 
nians,  we  can  use  the  natural  Riemannian  metric  on  these  manifolds,  known  as  the  Fubini-Study 
metric  to  define  distances  (see  Arias,  Edelman,  and  Smith  [1]). 


3.1  A  Riemannian  Metric  on  the  Object  Shape  Space  and  on  the  Image  Shape 
Space 

Given  two  objects,  i.e.  two  r-tuples  Plt . . . ,  Pr  and  Pi, . . . ,  Pr  of  points  in  R:i.  We  define  the 
distance  between  them,  or  more  specifically,  the  distance  between  their  shapes  KT~ 4  and  Kr~4, 
as  follows.  First  we  choose  orthonormal  bases  for  Kr~ 4  and  KT~4  as  subspaces  of  Rr  and  arrange 
those  vectors  as  the  columns  of  two  r  x  (r  —  4)  orthonormal  matrices  K  and  K.  We  then  compute 

the  singular  values  of  the  (r  -  4)  x  (r  -  4)  matrix  KTK  and  denote  by  8lt  (i  =  1 . r  -  4)  the  arc 

cosines  of  the  singular  values.  These  angles  are  the  so-called  principal  angles  between  the  subspaces. 

Definition  3.1.  The  affine  shape  distance  in  object  space  between  two  r-tuples  of  object  feature 
points  is  defined  to  be 


dobitKr-\Kr~4) 


r-4 


r— 4 

,£<  arccosAf}2 

N  i=t 


where  A,  are  the  singular  values  of  KrK  for  the  orthonormal  matrices  K  and  K  created  by  choosing 
orthonormal  bases  of  the  subspaces  Kr~4  and  KT~4  in  Rr.  See  below  for  examples. 

Definition  3.2.  Given  two  r-tuples  of  points  Qi,...,Qr  and  Q\,...,Qr  in  the  plane  representing 
certain  image  features,  we  define  the  affine  shape  distance  in  image  space  between  them  to  be 


dULr~\Lr^)  = 
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where  n  are  the  singular  values  of  LTL  for  orthonormal  matrices  L  and  L  created  by  choosing 
orthonormal  bases  for  Z/-3  and  Lr™3  in  Rr,  See  below  for  examples. 

We  remark  that  these  distances  are  the  natural  metric  distances  on  the  shape  spaces  X  and 
Y  which  are  the  Grassmann  manifolds  GrR(r  —  4,  i/r_1)  and  GrR(r  —  3,  because  they  are 

geodesic  submanifolds  of  GrR(r  —  4tr)  and  GrR(r  —  3t  r)  respectively, 

3.2  A  “Metric”  Measure  of  Matching  Between  an  Object  Feature  Set  and  an 
Image  Feature  Set 

Finally,  we  can  compute  a  “distance”  between  an  object  [Z<r“4]  €  X  and  an  image  [Lr“3]  €  Y. 
This  can  be  done  in  two  ways.  First  working  in  object  space  A,  we  get 

4//([^"4]- 1^"3])  =  L»in  d0b){Kr-\Kr-4) 

Kr~ 4 

where  K r~4  runs  over  all  objects  capable  of  producing  image  Lr~ 3,  he,  all  subspaces  Kr~A  C  Lr_3, 
Second  working  in  image  space  Y ,  we  get 

4//([^r“4U^r“3])  =  mmdrJn(^-3,Zr-3) 

1  Lr_3 

where  U~ 3  runs  over  all  images  of  the  object  Kr^47  Le.  all  subspaces  I/~3  C  Hr~l  which  contain 

Kr-4 

In  both  cases  these  values  work  out  to  be  the  square  root  of  the  sum  of  the  squares  of  the 
principal  angles  between  Ar~4  and  Z/-3  computed  from  the  arc-cosines  of  the  singular  values  of 
Lr K  in  the  same  manner  as  above. 

Theorem  3.3  (Object/Image  Metric  Duality).  The  distance  between  a  set  of  object  features 
Pi  , ,  *  PT  and  a  set  of  image  features  Qi  . . .  Qr  can  be  computed  either  in  object  space  by  minimiz¬ 
ing  the  affine  shape  distance  between  P\,...iPr  and  all  object  r-tuples  which  am  capable  of  being 
projected  to  Qi,...,Qr  ftdd  a  generalized  weak  perspective  projection),  or  in  image  space  as  the 
minimum  affine  shape  distance  between  Q\y ,  . ,  T  Qr  and  all  generalized  weak  perspective  projections 
of  P\ , , , . ,  Pr.  Moreover t  these  two  minimums  are  equal ,  i.e.  =  d^. 

Of  course  do/f  —  0  if  and  only  if  Pi, . . , ,  PT  can  be  projected  to  Qir . . , ,  Qr  via  a  generalized  weak 
perspective  projection.  We  remark  that  analogs  of  tills  result  can  be  proved  for  other  projection 
models. 

4  Examples 

In  this  section  we  create  a  number  of  examples  and  provide  the  Mathematica  code  necessary  to 
implement  some  of  the  results  of  the  paper.  Additional  code  to  generate  the  shape  space  equations 
and  the  object-image  equations  can  be  obtained  from  the  author.  The  examples  below  involve  the 
case  of  five  {n  —  5)  feature  points  projected  from  3D  to  2D  under  generalized  weak  perspective 
projection  (also  known  as  the  affine  case). 


f  » 


4.1  Object  Data 

We  begin  by  creating  three  (arbitrary)  3D  objects.  The  objects  are  described  by  point  features 
written  as  columns  of  a  4  by  n  matrix  in  so-called  homogeneous  form  (x,y,  z,  1).  Also  the  determi¬ 
nants  of  its  4  by  4  minors  (the  so-called  Pliicker  or  shape  coordinates)  are  listed  lexicographically. 
Some  of  the  syntax  here  is  taken  from  Mathematica  commands. 

'2  l  -1  3  -2  \ 

Minors[ObjectDataW,4]  =  {  {  7, -£, -2} } 


ObjectDataW  = 


ObjectDataX 


ObjectDataY  = 


4.2  Group  Actions 


MiuorsfObjectDataX, 4]  =  {{-2,  §,  §,  £,  §}} 


Minors  [ObjectDataY,  4]  =  {{3, 10,  —2,  —8, 1}} 


We  now  use  rotation,  translation,  scale  and  reflection  matrices  to  create  object  data  equivalent 
to  (i.e.  having  the  same  shape  as)  ObjectDataY  above. 


ScateAndReflecl 


RotAndTransl  = 


ObjectDataZl  =  Rot  AndTVansl. ObjectDataY  = 
Minors[ObjectDataZl,4]  —  {{3, 10,  —2,  —8, 1}} 

Object  DataZ2  =  ScalcAutlReflecl.  Rot  AndTVansl. ObjectDataY  = 


Minors  [ObjectDataZ2, 4]  =  {{—18,-60,12,48,-6}} 

Note  that  ObjectDataY,  ObjectDataZl,  and  ObjectDataZ2  all  have,  up  to  scale,  the  same  5- vector 
of  Pliicker  (shape)  coordinates,  because  they  are  just  transformations  of  the  same  object. 

4.3  Projections 

Now  we  create  some  generalized  weak  perspective  projection  matrices  taking  us  from  3D  to  2D. 

,  .  .  (  t  ~2  0  1  \  {2  — 12\  /  0  1  0  0  \ 

projectionl  =  - 1  0  1  3  j  projection2  =  ^  2  \  -2  i  J  projection3  =  ^  0  0  1  0  J 

Finally,  we  form  a  number  of  images  using  these  projections  and  the  object  data  above. 

0  3^ 
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ImageDatalY  —  projection!.  Ob  jecfcDataY  — 


Minors  [ImageDatalY,  3]  =  (“l 
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ImageData2Y  —  pro  jection2.  Object  Da  taY  — 
Minors  [ImageData2Y,  3]  =  {^  j  26  3  ^ 


ImageData3W  =  projection3.0bjectDataW  — 
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Minors  [ImageData3W,  3]  =  (— |  — |  | 

Notice  that  ImageDatalX  and  ImageDataSW  have  the  same  image  shape.  They  differ  by  a  scale 
factor  of  2  and  the  Pliicker  (shape)  coordinates  are  the  same  up  to  a  factor  of  4,  Thus  we  have  an 
instance  of  two  different,  i.e.  inequivalent,  object  shapes  producing  the  same  image  shape.  Likewise 
one  can  see  that  the  images  produced  by  a  particular  object,  say  ObjectDataX,  can  result  in 
different  (inequivalent)  image  shapes.  For  example  ImageDatalX  and  ImageData2X  are  not  the 
same  shape  because  their  vector  of  shape  coordinates  are  not  scalar  multiples  of  each  other.  This 
illustrates  the  many-to-one  and  one- to- many  nature  of  the  relationship  between  object  shapes  and 
image  shapes. 


4.4  Metrics 

The  next  commands  in  Mathematics  while  complicated  looking,  just  create  an  orthonormal 
basis  for  the  row  span  of  our  data  matrices,  multiplies  two  such  matrices  together,  and  then  finds 
the  singular  values  of  the  resulting  4  by  4,  3  by  3  or  3  by  4  (4  by  3)  matrix.  The  arc-cosines  of  these 
singular  values  are  the  so-called  principal  angles  between  the  subspaces  spanned  by  the  two  sets  of 
rows.  The  square  root,  of  the  sum  of  the  principle  angles  squared  is  the  value  of  the  metric.  In  the 
case  of  object  space  and  of  image  space  this  is  the  distance  provided  by  the  natural  Riemaxmian 
metric.  Of  course  you  are  free  to  scale  the  metric.  One  obvious  sailing  is  to  set  the  total  volume 
of  the  shape  space  equal  to  one,  so  that  volume  can  be  associated  with  a  probability  measure. 


4.5  Object-Object  Metric 

What  follows  is  a  Mathematica  function  that  computes  the  distance  between  two  objects  in  ob¬ 
ject  space.  Note  that  the  code  doesn't  depend  on  the  number  of  feature  points  or  the  dimensionality 
of  the  points. 

ObjectSpaceMctric[Objl_,  Obj2_]:— 

Norm[ArcCos[SinguiarValueList[Ar[QRDecornposit  ion  [Transpose [Ob jl  ]][[!]]  .Transpose 
[QRDecomposition[TYanspose[Obj2]|[[l]]]],  Tolerance  —  0]j| 

An  an  example  lets  compute  some  distances. 

ObjectSpaceMetricfObjectDataW,  ObjectDataX]  -  0.0654569 
ObjectSpaceMetriejOhjectDataX,  ObjectDataY]  =  1.54176 
ObjectSpaceMetric[ObjectDataY,  Object  DataZ2]  =  2,58096  x  10“g 


Notice  that  this  says  ObjectDataW  and  ObjectDataX  are  fairly  close  but  that  ObjectDataX  and 
ObjectDataY  are  relatively  far  apart.  Of  course  ObjectDataY  and  ObjectDataZ2  are  really  zero 
distance  apart  because  they  differ  by  an  affine  transformation  of  3-space. 

4.6  Image-Image  Metric 

We  now  introduce  the  metric  in  image  space.  Note  that  it  is  given  by  the  same  code  -  only  the 
input  sizes  have  changed. 

ImageSpaceMetric[Iml_,  Im2_j:=ObjectSpaceMetric[hnl,  Im2j; 

Let’s  now  compute  some  distances  in  image  space  between  our  image  shapes. 

ImageSpaceMetric  [ImageDatalX,  ImageDatalX]  =  0. 

ImageSpaceMetric  [ImageDatalX,  ImageData2X]  =  0.408449 
ImageSpaceMetric  [ImageData2X,  ImageDatalX]  =  0.408449 
ImageSpaceMetric{ImageData2X,  ImageData2X]  =  0. 

The  above  computation  illustrates  that  our  metric  is  symmetric  and  of  course  gives  zero  distance 
between  an  image  and  itself.  It  also  shows  again  that  these  two  images  of  ObjectDataX  are  not 
equivalent  and  cannot  be  transformed  one  to  the  other  by  an  affine  transformation  of  2-space. 

ImageS  paceMetric[ImageDatalY,ImageData2Y]  =  0.813806 
ImageSpaceMetric[ImageDatalX,  ImageDatalY]  =  1.14686 
ImageSpaceMetric  [ImageDatalX,  ImageData2Y]  =  0.641144 
ImageSpaceMetric[ImageData2X,  ImageDatalY]  =  1.22873 
ImageSpaceMetric  [ImageData2X,  ImageData2Y]  =  0.775138 
ImageSpaceMetric  [IinageD  ata  1 X,  ImageData3Wj  —  0. 

None  of  the  images  compared  are  equivalent,  i.e.  they  are  all  distinct  shapes,  except  for  Image¬ 
DatalX  and  ImageData3W  which  we  previously  observed  were  the  same  shape. 

4.7  Object-Image  Metric 

Finally  we  introduce  the  object-image  metric  as  a  fundamental  way  to  compare  the  matching 
of  object  data  with  image  data. 

ObjectImageMetric[Obj_,  Im.j  :=  ObjectSpaceMetric[Obj,  Im]; 

This  is  in  fact  the  metric  discussed  in  the  text,  although  a  proof  of  that  fact  requires  some  work. 
Examples: 


Ob  jectlmage  Metric  [ObjectDataX,  ImageDatalY]  =  1.14218 
ObjectImageMetric[ObjectDataX,  ImagedatalX]  =  .49012  x  10-8 
ObjectImageMetric[ObjectDataX,  Imagedata2X]  =  2.10734  x  10-8 

Note  these  later  two  are  zero  because  the  image  data  really  is  a  generalized  weak  perspective  pro¬ 
jection  of  the  object  data  The  object- image  metric  will  evaluate  to  zero  if  and  only  if  a  generalized 
weak  perspective  transformation  exists  which  carries  the  object  data  to  the  image  data. 


Final  example: 


0bjectImageMctric[0bjectDataX,ImageData3W]  =  1.49012  x  10  y 
GbjectlmageMe  trie  [ObjectDataW,  ImageDatalX]  =  2.10734  x  10^8 
ObjectlmageMetric  [Object  DataW,  ImageData2X]  =  0.0259829 

Of  course  the  first  two  of  these  are  zero  because  imageData3W  and  imageDatalX  are  both  the  same 
image  shape  and  are  projections  of  ObjectDataW  and  ObjectDataX  respectively.  However  there 
is  no  way  to  project  ObjectDataW  to  another  of  ObjectDataX’s  images  (namely  imageData2X) 
because  our  object-image  “distance”  in  that  case  is  not  zero.  The  closer  the  Object-Image 
metric  is  to  zero  the  closer  some  projection  of  the  given  object  will  be  to  the  given 
image. 

Object-Image  Relations 

Finally  let's  check  an  object-image  equation.  For  ObjectDataX  we  have  [1234]  =  -2  [1235]  = 

3/2  [1245]  =  5/2.  For  ImageDatalX  we  have  [123]  —  -5  [124]  =  -1  [125]  =41/2.  The  first 

object-image  equation  is  [1234]  [125]-[1 235]  [124] +[1245]  [123].  The  reader  can  check  this  is  indeed 
zero.  We  leave  it  as  an  exercise  to  check  the  vanishing  for  the  other  nine  object-image  equations. 
(See  above.)  Note  that  knowing  one  image  of  an  object  imposes  lineal’  relations  on  the  shape 
coordinates  of  the  object  and  if  we  have  sufficiently  many  independent  views  of  the  object,  we 
can  solve  for  its  shape,  which  determines  it  uniquely  up  to  an  affine  transformation  of  3-space. 
Mathematica  code  is  available  to  generate  the  object-image  equations  and  the  defining  equations 
of  the  shape  spaces  inside  the  appropriate  projective  space. 
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